Today is Allan Birnbaum’s Birthday. Birnbaum’s (1962) classic “On the Foundations of Statistical Inference” is in Breakthroughs in Statistics (volume I 1993). I’ve a hunch that Birnbaum would have liked my rejoinder to discussants of my forthcoming paper (Statistical Science): Bjornstad, Dawid, Evans, Fraser, Hannig, and Martin and Liu. I hadn’t realized until recently that all of this is up under “future papers” here . You can find the rejoinder: STS1404-004RA0-2. That takes away some of the surprise of having it all come out at once (and in final form). For those unfamiliar with the argument, at the end of this entry are slides from a recent, entirely informal, talk that I never posted, as well as some links from this blog. Happy Birthday Birnbaum!
I am honored and grateful to have so many interesting and challenging comments on my paper. I want to thank the discussants for their willingness to jump back into the thorny quagmire of Birnbaum’s argument. To a question raised in the paper “Does it matter?”, these discussions show the answer is yes. The enlightening connections to contemporary projects are especially valuable in galvanizing future efforts to address foundational issues in statistics.
As long-standing as Birnbaum’s result has been, Birnbaum himself went through dramatic shifts in a short period of time following his famous (1962) result. More than of historical interest, these shifts provide a unique perspective on the current problem.
Already in the rejoinder to Birnbaum (1962), he is worried about criticisms (by Pratt 1962) pertaining to applying WCP to his constructed mathematical mixtures (what I call Birnbaumization), and hints at replacing WCP with another principle (Irrelevant Censoring). Then there is a gap until around 1968 at which point Birnbaum declares the SLP plausible “only in the simplest case, where the parameter space has but two” predesignated points (1968, 301). He tells us in Birnbaum (1970a, 1033) that he has pursued the matter thoroughly leading to “rejection of both the likelihood concept and various proposed formalizations of prior information”. The basis for this shift is that the SLP permits interpretations that “can be seriously misleading with high probability” (1968, 301). He puts forward the “confidence concept” (Conf) which takes from the Neyman-Pearson (N-P) approach “techniques for systematically appraising and bounding the probabilities (under respective hypotheses) of seriously misleading interpretations of data” while supplying it an evidential interpretation (1970a, 1033). Given the many different associations with “confidence,” I use (Conf) in this Rejoinder to refer to Birnbaum’s idea. Many of the ingenious examples of the incompatibilities of SLP and (Conf) are traceable back to Birnbaum, optional stopping being just one (see Birnbaum 1969). A bibliography of Birnbaum’s work is Giere 1977. Before his untimely death (at 53), Birnbaum denies the SLP even counts as a principle of evidence (in Birnbaum 1977). He thought it anomalous that (Conf) lacked an explicit evidential interpretation even though, at an intuitive level, he saw it as the “one rock in a shifting scene” in statistical thinking and practice (Birnbaum 1970, 1033). I return to this in part IV of this rejoinder.
II. Bjornstad, Dawid, and Evans
Let me begin by answering the central criticisms that, if correct, would be obstacles to what I purport to have shown in my paper. It is entirely understandable that leading voices in a long-lived controversy would assume that all of the twists and turns, avenues and roadways, have already been visited, and that no new flaw in the argument could enter to shake up the debate. I say to the reader that the surest sign that the issue is unsettled is that my critics disagree among themselves about the puzzle and even the key principles under discussion: the WCP, and in one case, the SLP itself.
IV Post-SLP foundations
Return to where we left off in the opening section of this rejoinder: Birnbaum (1969).
The problem-area of main concern here may be described as that of determining precise concepts of statistical evidence (systematically linked with mathematical models of experiments), concepts which are to be non-Bayesian, non-decision-theoretic, and significantly relevant to statistical practice. (Birnbaum 1969, 113)
Given Neyman’s behavioral decision construal, Birnbaum claims that “when a confidence region estimate is interpreted as statistical evidence about a parameter”(1969, p. 122), an investigator has necessarily adjoined a concept of evidence, (Conf) that goes beyond the formal theory. What is this evidential concept? The furthest Birnbaum gets in defining (Conf) is in his posthumous article (1977):
(Conf) A concept of statistical evidence is not plausible unless it finds ‘strong evidence for H2 against H1’ with small probability (α) when H1 is true, and with much larger probability (1 – β) when H2 is true. (1977, 24)
On the basis of (Conf), Birnbaum reinterprets statistical outputs from N-P theory as strong, weak, or worthless statistical evidence depending on the error probabilities of the test (1977, 24-26). While this sketchy idea requires extensions in many ways (e.g., beyond pre-data error probabilities, and beyond the two hypothesis setting), the spirit of (Conf), that error probabilities qualify properties of methods which in turn indicate the warrant to accord a given inference, is, I think, a valuable shift of perspective. This is not the place to elaborate, except to note that my own twist on Birnbaum’s general idea is to appraise evidential warrant by considering the capabilities of tests to have detected erroneous interpretations, a concept I call severity. That Birnbaum preferred a propensity interpretation of error probabilities is not essential. What matters is their role in picking up how features of experimental design and modeling alter a methods’ capabilities to control “seriously misleading interpretations”. Even those who embrace a version of probabilism may find a distinct role for a severity concept. Recall that Fisher always criticized the presupposition that a single use of mathematical probability must be competent for qualifying inference in all logical situations (1956, 47).
Birnbaum’s philosophy evolved from seeking concepts of evidence in degree of support, belief, or plausibility between statements of data and hypotheses to embracing (Conf) with the required control of misleading interpretations of data. The former view reflected the logical empiricist assumption that there exist context-free evidential relationships—a paradigm philosophers of statistics have been slow to throw off. The newer (post-positivist) movements in philosophy and history of science were just appearing in the 1970s. Birnbaum was ahead of his time in calling for a philosophy of science relevant to statistical practice; it is now long overdue!
“Relevant clarifications of the nature and roles of statistical evidence in scientific research may well be achieved by bringing to bear in systematic concert the scholarly methods of statisticians, philosophers and historians of science, and substantive scientists” (Birnbaum 1972, 861).
The paper itself is here.
Below are my slides from my May 2, 2014 presentation in the Virginia Tech Department of Philosophy 2014 Colloquium series:
“Putting the Brakes on the Breakthrough, or
‘How I used simple logic to uncover a flaw in a controversial 50 year old ‘theorem’ in statistical foundations taken as a
‘breakthrough’ in favor of Bayesian vs frequentist error statistics’”
Some previous posts on this topic can be found at the following links (and by searching this blog with key words):
 I discovered, not long ago, that for months an uncorrected version was up at the Statistical Science page. I hope it didn’t confuse too many people.
Birnbaum, A. 1962. “On the Foundations of Statistical Inference.” In Breakthroughs in Statistics, edited by S. Kotz and N. Johnson, 1:478–518. Springer Series in Statistics 1993. New York: Springer-Verlag.