Today is the birthday of Lucien Le Cam (Nov. 18, 1924-April 25,2000): Please see my updated 2013 post on him.
Today is the birthday of Lucien Le Cam (Nov. 18, 1924-April 25,2000): Please see my updated 2013 post on him.
Memory Lane: 3 years ago. Oxford Jail (also called Oxford Castle) is an entirely fitting place to be on (and around) Halloween! Moreover, rooting around this rather lavish set of jail cells (what used to be a single cell is now a dressing room) is every bit as conducive to philosophical reflection as is exile on Elba! (It is now a boutique hotel, though many of the rooms are still too jail-like for me.) My goal (while in this gaol—as the English sometimes spell it) is to try and free us from the bogeymen and bogeywomen often associated with “classical” statistics. As a start, the very term “classical statistics” should, I think, be shelved, not that names should matter.
In appraising statistical accounts at the foundational level, we need to realize the extent to which accounts are viewed through the eyeholes of a mask or philosophical theory. Moreover, the mask some wear while pursuing this task might well be at odds with their ordinary way of looking at evidence, inference, and learning. In any event, to avoid non-question-begging criticisms, the standpoint from which the appraisal is launched must itself be independently defended. But for (most) Bayesian critics of error statistics the assumption that uncertain inference demands a posterior probability for claims inferred is thought to be so obvious as not to require support. Critics are implicitly making assumptions that are at odds with the frequentist statistical philosophy. In particular, they assume a certain philosophy about statistical inference (probabilism), often coupled with the allegation that error statistical methods can only achieve radical behavioristic goals, wherein all that matters are long-run error rates (of some sort)
Criticisms then follow readily: the form of one or both:
From Gelman’s blog:
“In one of life’s horrible ironies, I wrote a paper “Why we (usually) don’t have to worry about multiple comparisons” but now I spend lots of time worrying about multiple comparisons”
Posted by 14 October 2014, 11:13 am on
Exhibit B: The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time, in press. (Andrew Gelman and Eric Loken) (Shortened version is here.)
The “forking paths” paper, in my reading, basically argues that mere hypothetical possibilities about what you would or might have done had the data been different (in order to secure a desired interpretation) suffices to alter the characteristics of the analysis you actually did. That’s an error statistical argument–maybe even stronger than what some error statisticians would say. What’s really being condemned are overly flexible ways to move from statistical results to substantive claims. The p-values are illicit when taken to provide evidence for those claims because an actual p-value requires Prob(P < p;Ho) = p (and the actual p-value has become much greater by design). The criticism makes perfect sense if you’re scrutinizing inferences according to how well or severely tested they are. Actual error probabilities are accordingly altered or unable to be calculated. However, if one is going to scrutinize inferences according to severity then the same problematic flexibility would apply to Bayesian analyses, whether or not they have a way to pick up on it. (It’s problematic if they don’t.) I don’t see the magic by which a concern for multiple testing disappears in Bayesian analysis (e.g., in the first paper) except by assuming some prior takes care of it..
Congratulations to Faye Flam for finally getting her article published at the Science Times at the New York Times, “The odds, continually updated” after months of reworking and editing, interviewing and reinterviewing. I’m grateful too, that one remark from me remained. Seriously I am. A few comments: The Monty Hall example is simple probability not statistics, and finding that fisherman who floated on his boots at best used likelihoods. I might note, too, that critiquing that ultra-silly example about ovulation and voting–a study so bad they actually had to pull it at CNN due to reader complaints[i]–scarcely required more than noticing the researchers didn’t even know the women were ovulating[ii]. Experimental design is an old area of statistics developed by frequentists; on the other hand, these ovulation researchers really believe their theory, so the posterior checks out.
The article says, Bayesian methods can “crosscheck work done with the more traditional or ‘classical’ approach.” Yes, but on traditional frequentist grounds. What many would like to know is how to cross check Bayesian methods—how do I test your beliefs? Anyway, I should stop kvetching and thank Faye and the NYT for doing the article at all[iii]. Here are some excerpts:
Statistics may not sound like the most heroic of pursuits. But if not for statisticians, a Long Island fisherman might have died in the Atlantic Ocean after falling off his boat early one morning last summer.
0. July 20, 2014: Some of the comments to this post reveal that using the word “fallacy” in my original title might have encouraged running together the current issue with the fallacy of transposing the conditional. Please see a newly added Section 7.
July 4, 2014 was the two year anniversary of the Higgs boson discovery. As the world was celebrating the “5 sigma!” announcement, and we were reading about the statistical aspects of this major accomplishment, I was aghast to be emailed a letter, purportedly instigated by Bayesian Dennis Lindley, through Tony O’Hagan (to the ISBA). Lindley, according to this letter, wanted to know:
“Are the particle physics community completely wedded to frequentist analysis? If so, has anyone tried to explain what bad science that is?”
“Wonderful examples, but let’s not close our eyes,” is David J. Hand’s apt title for his discussion of the recent special issue (Feb 2014) of Statistical Science called “Big Bayes Stories” (edited by Sharon McGrayne, Kerrie Mengersen and Christian Robert.) For your Saturday night/ weekend reading, here are excerpts from Hand, another discussant (Welsh), scattered remarks of mine, along with links to papers and background. I begin with David Hand:
[The papers in this collection] give examples of problems which are well-suited to being tackled using such methods, but one must not lose sight of the merits of having multiple different strategies and tools in one’s inferential armory.(Hand )_
…. But I have to ask, is the emphasis on ‘Bayesian’ necessary? That is, do we need further demonstrations aimed at promoting the merits of Bayesian methods? … The examples in this special issue were selected, firstly by the authors, who decided what to write about, and then, secondly, by the editors, in deciding the extent to which the articles conformed to their desiderata of being Bayesian success stories: that they ‘present actual data processing stories where a non-Bayesian solution would have failed or produced sub-optimal results.’ In a way I think this is unfortunate. I am certainly convinced of the power of Bayesian inference for tackling many problems, but the generality and power of the method is not really demonstrated by a collection specifically selected on the grounds that this approach works and others fail. To take just one example, choosing problems which would be difficult to attack using the Neyman-Pearson hypothesis testing strategy would not be a convincing demonstration of a weakness of that approach if those problems lay outside the class that that approach was designed to attack.
Hand goes on to make a philosophical assumption that might well be questioned by Bayesians: Continue reading
At the start of our seminar, I said that “on weekends this spring (in connection with Phil 6334, but not limited to seminar participants) I will post some of my ‘deconstructions‘ of articles”. I began with Andrew Gelman‘s note “Ethics and the statistical use of prior information”[i], but never posted my deconstruction of it. So since it’s Saturday night, and the seminar is just ending, here it is, along with related links to Stat and ESP research (including me, Jack Good, Persi Diaconis and Pat Suppes). Please share comments especially in relation to current day ESP research. Continue reading
“What ever happened to Bayesian foundations?” was one of the final topics of our seminar (Mayo/SpanosPhil6334). In the past 15 years or so, not only have (some? most?) Bayesians come to accept violations of the Likelihood Principle, they have also tended to disown Dutch Book arguments, and the very idea of inductive inference as updating beliefs by Bayesian conditionalization has evanescencd. In one of Thursday’s readings, by Baccus, Kyburg, and Thalos (1990), it is argued that under certain conditions, it is never a rational course of action to change belief by Bayesian conditionalization. Here’s a short snippet for your Saturday night reading (the full paper is http://errorstatistics.files.wordpress.com/2014/05/bacchus_kyburg_thalos-against-conditionalization.pdf): Continue reading
We interspersed key issues from the reading for this session (from Howson and Urbach) with portions of my presentation at the Boston Colloquium (Feb, 2014): Revisiting the Foundations of Statistics in the Era of Big Data: Scaling Up to Meet the Challenge. (Slides below)*.
Someone sent us a recording (mp3)of the panel discussion from that Colloquium (there’s a lot on “big data” and its politics) including: Mayo, Xiao-Li Meng (Harvard), Kent Staley (St. Louis), and Mark van der Laan (Berkeley).
See if this works: | mp3
*There’s a prelude here to our visitor on April 24: Professor Stanley Young from the National Institute of Statistical Sciences.
A. Spanos Probability/Statistics Lecture Notes 7: An Introduction to Bayesian Inference (4/10/14)
April 3, 2014: We interspersed discussion with slides; these cover the main readings of the day (check syllabus): the Duhem’s Probem and the Bayesian Way, and “Highly probable vs Highly Probed”. syllabus four. Slides are below (followers of this blog will be familiar with most of this, e.g., here). We also did further work on misspecification testing.
Monday, April 7, is an optional outing, “a seminar class trip”
you might say, here at Thebes at which time we will analyze the statistical curves of the mountains, pie charts of pizza, and (seriously) study some experiments on the problem of replication in “the Hamlet Effect in social psychology”. If you’re around please bop in!
Mayo’s slides on Duhem’s Problem and more from April 3 (Day#9):
It was from my Virginia Tech colleague I.J. Good (in statistics), who died five years ago (April 5, 2009), at 93, that I learned most of what I call “howlers” on this blog. His favorites were based on the “paradoxes” of stopping rules. (I had posted this last year here.)
“In conversation I have emphasized to other statisticians, starting in 1950, that, in virtue of the ‘law of the iterated logarithm,’ by optional stopping an arbitrarily high sigmage, and therefore an arbitrarily small tail-area probability, can be attained even when the null hypothesis is true. In other words if a Fisherian is prepared to use optional stopping (which usually he is not) he can be sure of rejecting a true null hypothesis provided that he is prepared to go on sampling for a long time. The way I usually express this ‘paradox’ is that a Fisherian [but not a Bayesian] can cheat by pretending he has a plane to catch like a gambler who leaves the table when he is ahead” (Good 1983, 135) [*]
Day #3: Spanos lecture notes 2, and reading/resources from Feb 6 seminar
Crupi & Tentori (2010). Irrelevant Conjunction: Statement and Solution of a New Paradox, Phil Sci, 77, 1–13.
Hawthorne & Fitelson (2004). Re-Solving Irrelevant Conjunction with Probabilistic Independence, Phil Sci 71: 505–514.
An update on this issue will appear shortly in a separate blogpost.
READING FOR NEXT WEEK
Selection (pp. 35-59) from: Popper (1962). Conjectures and Refutations: The Growth of Scientific Knowledge. Basic Books.
Andrew Gelman says that as a philosopher, I should appreciate his blog today in which he records his frustration: “Against aggressive definitions: No, I don’t think it helps to describe Bayes as ‘the analysis of subjective beliefs’…” Gelman writes:
I get frustrated with what might be called “aggressive definitions,” where people use a restrictive definition of something they don’t like. For example, Larry Wasserman writes (as reported by Deborah Mayo):
“I wish people were clearer about what Bayes is/is not and what frequentist inference is/is not. Bayes is the analysis of subjective beliefs but provides no frequency guarantees. Frequentist inference is about making procedures that have frequency guarantees but makes no pretense of representing anyone’s beliefs.”
I’ll accept Larry’s definition of frequentist inference. But as for his definition of Bayesian inference: No no no no no. The probabilities we use in our Bayesian inference are not subjective, or, they’re no more subjective than the logistic regressions and normal distributions and Poisson distributions and so forth that fill up all the textbooks on frequentist inference.
To quickly record some of my own frustrations:*: First, I would disagree with Wasserman’s characterization of frequentist inference, but as is clear from Larry’s comments to (my reaction to him), I think he concurs that he was just giving a broad contrast. Please see Note  for a remark from my post: Comments on Wasserman’s “what is Bayesian/frequentist inference?” Also relevant is a Gelman post on the Bayesian name: .
Second, Gelman’s “no more subjective than…” evokes remarks I’ve made before. For example, in “What should philosophers of science do…” I wrote:
Arguments given for some very popular slogans (mostly by non-philosophers), are too readily taken on faith as canon by others, and are repeated as gospel. Examples are easily found: all models are false, no models are falsifiable, everything is subjective, or equally subjective and objective, and the only properly epistemological use of probability is to supply posterior probabilities for quantifying actual or rational degrees of belief. Then there is the cluster of “howlers” allegedly committed by frequentist error statistical methods repeated verbatim (discussed on this blog).
I’ve written a lot about objectivity on this blog, e.g., here, here and here (and in real life), but what’s the point if people just rehearse the “everything is a mixture…” line, without making deeply important distinctions? I really think that, next to the “all models are false” slogan, the most confusion has been engendered by the “no methods are objective” slogan. However much we may aim at objective constraints, it is often urged, we can never have “clean hands” free of the influence of beliefs and interests, and we invariably sully methods of inquiry by the entry of background beliefs and personal judgments in their specification and interpretation. Continue reading
Memory Lane: 2 years ago:
My efficient Errorstat Blogpeople1 have put forward the following 3 reader-contributed interpretive efforts2 as a result of the “deconstruction” exercise from December 11, (mine, from the earlier blog, is at the end) of what I consider:
“….an especially intriguing remark by Jim Berger that I think bears upon the current mindset (Jim is aware of my efforts):
Too often I see people pretending to be subjectivists, and then using “weakly informative” priors that the objective Bayesian community knows are terrible and will give ridiculous answers; subjectivism is then being used as a shield to hide ignorance. . . . In my own more provocative moments, I claim that the only true subjectivists are the objective Bayesians, because they refuse to use subjectivism as a shield against criticism of sloppy pseudo-Bayesian practice. (Berger 2006, 463)” (From blogpost, Dec. 11, 2011)
The statistics literature is big enough that I assume there really is some bad stuff out there that Berger is reacting to, but I think that when he’s talking about weakly informative priors, Berger is not referring to the work in this area that I like, as I think of weakly informative priors as specifically being designed to give answers that are _not_ “ridiculous.”
Keeping things unridiculous is what regularization’s all about, and one challenge of regularization (as compared to pure subjective priors) is that the answer to the question, What is a good regularizing prior?, will depend on the likelihood. There’s a lot of interesting theory and practice relating to weakly informative priors for regularization, a lot out there that goes beyond the idea of noninformativity.
To put it another way: We all know that there’s no such thing as a purely noninformative prior: any model conveys some information. But, more and more, I’m coming across applied problems where I wouldn’t want to be noninformative even if I could, problems where some weak prior information regularizes my inferences and keeps them sane and under control. Continue reading
I attended a lecture by Aris Spanos to his graduate econometrics class here at Va Tech last week[i]. This course, which Spanos teaches every fall, gives a superb illumination of the disparate pieces involved in statistical inference and modeling, and affords clear foundations for how they are linked together. His slides follow the intro section. Some examples with severity assessments are also included.
Frequentist Hypothesis Testing: A Coherent Approach
1 Inherent difficulties in learning statistical testing
Statistical testing is arguably the most important, but also the most difficult and confusing chapter of statistical inference for several reasons, including the following.
(i) The need to introduce numerous new notions, concepts and procedures before one can paint — even in broad brushes — a coherent picture of hypothesis testing.
(ii) The current textbook discussion of statistical testing is both highly confusing and confused. There are several sources of confusion.
(iii) The discussion of frequentist testing is rather incomplete in so far as it has been beleaguered by serious foundational problems since the 1930s. As a result, different applied fields have generated their own secondary literatures attempting to address these problems, but often making things much worse! Indeed, in some fields like psychology it has reached the stage where one has to correct the ‘corrections’ of those chastising the initial correctors!
In an attempt to alleviate problem (i), the discussion that follows uses a sketchy historical development of frequentist testing. To ameliorate problem (ii), the discussion includes ‘red flag’ pointers (¥) designed to highlight important points that shed light on certain erroneous in- terpretations or misleading arguments. The discussion will pay special attention to (iii), addressing some of the key foundational problems.
[i] It is based on Ch. 14 of Spanos (1999) Probability Theory and Statistical Inference. Cambridge[ii].
[ii] You can win a free copy of this 700+ page text by creating a simple palindrome! http://errorstatistics.com/palindrome/march-contest/
A fundamental tenet of the conception of inductive learning most at home with the frequentist philosophy is that inductive inference requires building up incisive arguments and inferences by putting together several different piece-meal results; we have set out considerations to guide these pieces[i]. Although the complexity of the issues makes it more difficult to set out neatly, as, for example, one could by imagining that a single algorithm encompasses the whole of inductive inference, the payoff is an account that approaches the kind of arguments that scientists build up in order to obtain reliable knowledge and understanding of a field.” (273)[ii]
A reread for Saturday night?
[i]The pieces hang together by dint of the rationale growing out of a severity criterion (or something akin but using a different term.)
[ii]Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science (D Mayo and A. Spanos eds.), Cambridge: Cambridge University Press: 1-27. This paper appeared in The Second Erich L. Lehmann Symposium: Optimality, 2006, Lecture Notes-Monograph Series, Volume 49, Institute of Mathematical Statistics, pp. 247-275.