A.Birnbaum: Statistical Methods in Scientific Inference

Birnbaum: born May 27, 1923

Today is (statistician) Allan Birnbaum’s birthday. He lived to be only 53 [i]. From the perspective of philosophy of statistics and philosophy of science, Birnbaum is best known for his work on likelihood, the Likelihood Principle [ii], and for his attempts to blend concepts of likelihood with error probability ideas to obtain what he called “concepts of statistical evidence”. Failing to find adequate concepts of statistical evidence, Birnbaum called for joining the work of “interested statisticians, scientific workers and philosophers and historians of science”–an idea I would heartily endorse!  While known for attempts to argue that the (strong) Likelihood Principle followed from sufficiency and conditionality principles, a few years after publishing this result, he seems to have turned away from it, perhaps discovering gaps in his argument.

NATURE VOL. 225 MARCH 14, 1970 (1033)


Statistical methods in Scientific Inference

 It is regrettable that Edwards’s interesting article[1], supporting the likelihood and prior likelihood concepts, did not point out the specific criticisms of likelihood (and Bayesian) concepts that seem to dissuade most theoretical and applied statisticians from adopting them. As one whom Edwards particularly credits with having ‘analysed in depth…some attractive properties” of the likelihood concept, I must point out that I am not now among the ‘modern exponents” of the likelihood concept. Further, after suggesting that the notion of prior likelihood was plausible as an extension or analogue of the usual likelihood concept (ref.2, p. 200)[2], I have pursued the matter through further consideration and rejection of both the likelihood concept and various proposed formalizations of prior information and opinion (including prior likelihood).  I regret not having expressed my developing views in any formal publication between 1962 and late 1969 (just after ref. 1 appeared). My present views have now, however, been published in an expository but critical article (ref. 3, see also ref. 4)[3] [4], and so my comments here will be restricted to several specific points that Edwards raised.

 If there has been ‘one rock in a shifting scene’ or general statistical thinking and practice in recent decades, it has not been the likelihood concept, as Edwards suggests, but rather the concept by which confidence limits and hypothesis tests are usually interpreted, which we may call the confidence concept of statistical evidence. This concept is not part of the Neyman-Pearson theory of tests and confidence region estimation, which denies any role to concepts of statistical evidence, as Neyman consistently insists. The confidence concept takes from the Neyman-Pearson approach techniques for systematically appraising and bounding the probabilities (under respective hypotheses) of seriously misleading interpretations of data. (The absence of a comparable property in the likelihood and Bayesian approaches is widely regarded as a decisive inadequacy.) The confidence concept also incorporates important but limited aspects of the likelihood concept: the sufficiency concept, expressed in the general refusal to use randomized tests and confidence limits when they are recommended by the Neyman-Pearson approach; and some applications of the conditionality concept. It is remarkable that this concept, an incompletely formalized synthesis of ingredients borrowed from mutually incompatible theoretical approaches, is evidently useful continuously in much critically informed statistical thinking and practice [emphasis mine].

While inferences of many sorts are evident everywhere in scientific work, the existence of precise, general and accurate schemas of scientific inference remains a problem. Mendelian examples like those of Edwards and my 1969 paper seem particularly appropriate as case-study material for clarifying issues and facilitating effective communication among interested statisticians, scientific workers and philosophers and historians of science.

Allan Birnbaum
New York University
Courant Institute of Mathematical Sciences,
251 Mercer Street,
New York, NY 10012

Birnbaum’s confidence concept, sometimes written (Conf), was his attempt to find in error statistical ideas a concept of statistical evidence–a term that he invented and popularized. In Birnbaum 1977 (24), he states it as follows:

(Conf): A concept of statistical evidence is not plausible unless it finds ‘strong evidence for J as against H with small probability (α) when H is true, and with much larger probability (1 – β) when J is true.

Birnbaum questioned whether Neyman-Pearson methods had “concepts of evidence”  simply because Neyman talked of “inductive behavior” and Wald and others cauched statistical methods in decision-theoretic terms. I have been urging that we consider instead how the tools may actually be used, and not be restricted by the statistical philosophies of founders (not to mention that so many of their statements are tied up with personality disputes, and problems of “anger management”). Recall, as well, E. Pearson’s insistence on an evidential construal of N-P methods, and the fact that Neyman, in practice, spoke of drawing inferences and reaching conclusions (e.g., Neyman’s nursery posts, links in [iii] below).

Still, since Birnbaum’s (Conf) sounds to be alluding to pre-trial error probabilities, I regard (Conf) as still too “behavioristic”. Some of his papers hint at the possibility that he would have wanted to use it in a (post-data) assessment of how well (or poorly) various claims were actually tested. (Aside from that, he also leans to a focus on simple statistical hypotheses, though Conf need not be so restricted.) I have developed the concept of severity and severe testing to provide an “evidential” or “inferential” notion, along with a statistical philosophy and a philosophy of science in which it is to be embedded.

I think that Fisher (1955) is essentially correct in maintaining that “When, therefore, Neyman denies the existence of inductive reasoning he is merely expressing a verbal preference”.  It is a verbal preference one can also find in Popper’s view of corroboration. (He, and current day critical rationalists, also hold that probability arises to evaluate degrees of severity, well-testedness or corroboration, not inductive confirmation.)  The inference to the severely corroborated claim is still, I would say, inductive. It goes beyond the premises. It is qualified by the relevant severity assessments. This blog may be searched for more on Popper and the rest….

I have many of Birnbaum’s original drafts of papers and articles here (with carbon copies (!) and hand-written notes in the margins), thanks to the philosopher of science, Ronald Giere, who gave them to me years ago[iii].


[i] His untimely death was a suicide.

[ii] A considerable number of posts on the strong likelihood principle (SLP) may be found searching  this blog, some would say far too many(e.g., here and here). Links or references to the associated literature, perhaps all of it, may also be found here.

[iii]See posts under “Neyman’s Nursery” (1, 2, 3, 4, 5)


[1] Edwards, A. W. F., Nature, 222, 1233 (1969)

[2] Birnbaum, A., J. Amer. Stat. Assoc., 57, 269 (1962)Birnbaum, A. (1962), “On the Foundations of Statistical Inference“, Journal of the American Statistical Association 57(298), 269-306.

[3] Birnbaum, A., in Philosophy, Science and Method: Essays in Honor of Ernest Nagel (edited by Morgenbesser, S., Suppes, P., and While, M.) (St. Martin’s Press. NY,1969).

[4] Likelihood in International Encyclopedia of the Social Sciences (Crowell-Collier, NY, 1968).

Birnbaum, A. (1977). “The Neyman-Pearson theory as decision theory, and as inference theory; with a criticism of the Lindley-Savage argument for Bayesian theory”. Synthese 36 (1) : 19-49.

Categories: Likelihood Principle, phil/history of stat, Statistics | Tags:

Post navigation

3 thoughts on “A.Birnbaum: Statistical Methods in Scientific Inference

  1. Michael Lew

    I’ve investigated the “the specific criticisms of likelihood (and Bayesian) concepts that seem to dissuade most theoretical and applied statisticians from adopting them” that Birnbaum alludes to and can’t find anything of substance in the two references that he supplies other than: (i) there is a disagreement between likelihood methods and Neymanian confidence intervals and Birnbaum says he prefers the latter; and (ii) a misapprehension about likelihood relating to a single point dataset in which Birnbaum tries to compare two models with different numbers of parameters by way of their likelihoods.

    (I am ignoring the exaggeration of “most”.)

    On the first item, we should expect likelihoods to disagree with Neyman. The methods are different. Such disagreement cannot be taken, by itself, to indicate a failing on the part of likelihood methods. One could equally well turn it around and use it as an argument against Neymanian intervals.

    The second item seems to be a consequence of an assumption that the likelihood is _equal_ to the probability of the observation. That is wrong (although it is easy to find that error in other papers by, for instance, Forster). Likelihood is _proportional_ to the probability of the observation. We cannot compare likelihoods that come from distinct likelihood functions as there are two unknown constants of proportionality that get in the way. Different models yield different likelihood functions, so…

    (I note that Hacking’s review of Edwards’s book also contains the same mistake. Did Hacking and Birnbaum convince each other to change their minds on likelihood based on that mistake?)

    This is an area of active interest to me, so I’d like to ask if you know of any arguments that Birnbaum might have had in mind beyond the two that I’ve listed?

  2. Michael: As for what Birnbaum had in mind, my best recommendation would be to check Birnbaum, especially his later work, Binrbaum 1977. As for what I think, I can best recommend searching this blog under the topics that interest you. In addition, lnks to most of my published work may also be found off this page. Thanks for your interest.

  3. Michael Lew


    Thank you. I’ve read Birnbaum 1977 without finding any arguments against likelihoods as measures of evidence beyond a failure to “satisfy those who prefer the confidence concept”. In fact, Birnbaum praises likelihood for offering “attractive features of systematic precision and generality”.

    The fact that Birnbaum changed his mind is not a useful argument that likelihood is flawed in the absence of his reasons being cogent and correct.

Blog at WordPress.com.