Allan Birnbaum died 40 years ago today. He lived to be only 53 [i]. From the perspective of philosophy of statistics and philosophy of science, Birnbaum is best known for his work on likelihood, the Likelihood Principle [ii], and for his attempts to blend concepts of likelihood with error probability ideas to arrive at what he termed “concepts of statistical evidence”. Failing to find adequate concepts of statistical evidence, Birnbaum called for joining the work of “interested statisticians, scientific workers and philosophers and historians of science”–an idea I have heartily endorsed. While known for a result that the (strong) Likelihood Principle followed from sufficiency and conditionality principles (a result that Jimmy Savage deemed one of the greatest breakthroughs in statistics), a few years after publishing it, he turned away from it, perhaps discovering gaps in his argument. A post linking to a 2014 Statistical Science issue discussing Birnbaum’s result is here. Reference  links to the Synthese 1977 volume dedicated to his memory. The editors describe it as their way of “paying homage to Professor Birnbaum’s penetrating and stimulating work on the foundations of statistics”. Ample weekend reading!
NATURE VOL. 225 MARCH 14, 1970 (1033)
LETTERS TO THE EDITOR
Statistical Methods in Scientific Inference (posted earlier here)
It is regrettable that Edwards’s interesting article, supporting the likelihood and prior likelihood concepts, did not point out the specific criticisms of likelihood (and Bayesian) concepts that seem to dissuade most theoretical and applied statisticians from adopting them. As one whom Edwards particularly credits with having ‘analysed in depth…some attractive properties” of the likelihood concept, I must point out that I am not now among the ‘modern exponents” of the likelihood concept. Further, after suggesting that the notion of prior likelihood was plausible as an extension or analogue of the usual likelihood concept (ref.2, p. 200), I have pursued the matter through further consideration and rejection of both the likelihood concept and various proposed formalizations of prior information and opinion (including prior likelihood). I regret not having expressed my developing views in any formal publication between 1962 and late 1969 (just after ref. 1 appeared). My present views have now, however, been published in an expository but critical article (ref. 3, see also ref. 4) , and so my comments here will be restricted to several specific points that Edwards raised.
If there has been ‘one rock in a shifting scene’ or general statistical thinking and practice in recent decades, it has not been the likelihood concept, as Edwards suggests, but rather the concept by which confidence limits and hypothesis tests are usually interpreted, which we may call the confidence concept of statistical evidence. This concept is not part of the Neyman-Pearson theory of tests and confidence region estimation, which denies any role to concepts of statistical evidence, as Neyman consistently insists. The confidence concept takes from the Neyman-Pearson approach techniques for systematically appraising and bounding the probabilities (under respective hypotheses) of seriously misleading interpretations of data. (The absence of a comparable property in the likelihood and Bayesian approaches is widely regarded as a decisive inadequacy.) The confidence concept also incorporates important but limited aspects of the likelihood concept: the sufficiency concept, expressed in the general refusal to use randomized tests and confidence limits when they are recommended by the Neyman-Pearson approach; and some applications of the conditionality concept. It is remarkable that this concept, an incompletely formalized synthesis of ingredients borrowed from mutually incompatible theoretical approaches, is evidently useful continuously in much critically informed statistical thinking and practice [emphasis mine].
While inferences of many sorts are evident everywhere in scientific work, the existence of precise, general and accurate schemas of scientific inference remains a problem. Mendelian examples like those of Edwards and my 1969 paper seem particularly appropriate as case-study material for clarifying issues and facilitating effective communication among interested statisticians, scientific workers and philosophers and historians of science.
New York University
Courant Institute of Mathematical Sciences,
251 Mercer Street,
New York, NY 10012
Birnbaum’s confidence concept, sometimes written (Conf), was his attempt to find in error statistical ideas a concept of statistical evidence–a term that he invented and popularized. In Birnbaum 1977 (24), he states it as follows:
(Conf): A concept of statistical evidence is not plausible unless it finds ‘strong evidence for J as against H with small probability (α) when H is true, and with much larger probability (1 – β) when J is true.
Birnbaum questioned whether Neyman-Pearson methods had “concepts of evidence” simply because Neyman talked of “inductive behavior” and Wald and others cauched statistical methods in decision-theoretic terms. I have been urging that we consider instead how the tools may actually be used, and not be restricted by the statistical philosophies of founders (not to mention that so many of their statements are tied up with personality disputes, and problems of “anger management”). Recall, as well, E. Pearson’s insistence on an evidential construal of N-P methods, and the fact that Neyman, in practice, spoke of drawing inferences and reaching conclusions (e.g., Neyman’s nursery posts, links in [iii] below).
Still, since Birnbaum’s (Conf) appears to allude to pre-trial error probabilities, I regard (Conf) as still too “behavioristic”. But I discovered that Pratt, in the link in  below, entertains the possibility of viewing Conf in terms of what might be called post-data or “attained” error probabilities. Some of his papers hint at the possibility that he would have wanted to use Conf for a post-data assessment of how well (or poorly) various claims were tested. I developed the concept of severity and severe testing to provide an “evidential” or “inferential” notion, along with a statistical philosophy and a philosophy of science in which it is to be embedded.
I think that Fisher (1955) is essentially correct in maintaining that “When, therefore, Neyman denies the existence of inductive reasoning he is merely expressing a verbal preference”. It is a verbal preference one can also find in Popper’s view of corroboration. (He, and current day critical rationalists, also hold that probability arises to evaluate degrees of severity, well-testedness or corroboration, not inductive confirmation.) The inference to the severely corroborated claim is still inductive. It goes beyond the premises. It is qualified by the relevant severity assessments.
I have many of Birnbaum’s original drafts of papers and articles here (with carbon copies (!) and hand-written notes in the margins), thanks to the philosopher of science, Ronald Giere, who gave them to me years ago[iii].
[i] His untimely death was a suicide.
[ii] A considerable number of posts on the strong likelihood principle (SLP) may be found searching this blog (e.g., here and here). Links or references to the associated literature, perhaps all of it, may also be found here. A post linking to the 2014 Statistical Science issue on my criticism of Birnbaum’s “breakthrough” (to the SLP) is here.
[iii]See posts under “Neyman’s Nursery” (1, 2, 3, 4, 5)
 Birnbaum, A., in Philosophy, Science and Method: Essays in Honor of Ernest Nagel (edited by Morgenbesser, S., Suppes, P., and White, M.) (St. Martin’s Press. NY, 1969).
 Likelihood in International Encyclopedia of the Social Sciences (Crowell-Collier, NY, 1968).
 Full contents of Synthese 1977, dedicated to his memory in 1977, can be found in this post.
 Birnbaum, A. (1977). “The Neyman-Pearson theory as decision theory, and as inference theory; with a criticism of the Lindley-Savage argument for Bayesian theory”. Synthese 36 (1) : 19-49. See links in 
I instinctively dislike many aspects of the NP approach – I think in general probably due to the (overuse) of optimality concepts.
On the other hand, Fraser’s writing has (re-) convinced me that there is something to the ‘Confidence’ concept (and hence p-values) after all. Furthermore, that Fisher was essentially right modulo a few details that are usually overemphasised.
The (draft?) Fraser paper I sent you recently “p-values: The insight to modern statistical inference” (http://www.utstat.toronto.edu/dfraser/documents/276-AnnRev-v2.pdf) states:
“In essence a p-value records just where a data value is located relative to a parameter value of interest, or where it is with respect to a hypothesis of interest, and does this in statistical units…
…Our approach here is to describe pragmatically what has happened and thus record just where the data value is with respect to the parameter value of interest, avoiding decision statements or procedural rules, and leaving evaluation to the judgment of the appropriate community of researchers”
I can agree with this wholeheartedly. All that remains is the crucial problem of nuisance parameters and tackling higher-dimensional problems. In this regard I agree with Birnbaum that the key contribution of NP was “techniques for systematically appraising and bounding the probabilities” in these scenarios.
Under simple one-dimensional, no-nuisance parameter (etc) problems Confidence = Fiducial = SEV (and approx = Likelihood). So I’d love to hear more about the philosophical side of Conf/SEV in the presence of nuisance parameters and NP contributions in this regard. This seems where the key contributions lie, where Bayes typically claims to provide the solution (i.e. integrate out using priors), as well as where most practical folk need guidance (every problem has nuisance parameters).