Statistical Science: The Likelihood Principle issue is out…!

Stat SciAbbreviated Table of Contents:

Table of ContentsHere are some items for your Saturday-Sunday reading. 

Link to complete discussion: 

Mayo, Deborah G. On the Birnbaum Argument for the Strong Likelihood Principle (with discussion & rejoinder). Statistical Science 29 (2014), no. 2, 227-266.

Links to individual papers:

Mayo, Deborah G. On the Birnbaum Argument for the Strong Likelihood Principle. Statistical Science 29 (2014), no. 2, 227-239.

Dawid, A. P. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 240-241.

Evans, Michael. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 242-246.

Martin, Ryan; Liu, Chuanhai. Discussion: Foundations of Statistical Inference, Revisited. Statistical Science 29 (2014), no. 2, 247-251.

Fraser, D. A. S. Discussion: On Arguments Concerning Statistical Principles. Statistical Science 29 (2014), no. 2, 252-253.

Hannig, Jan. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 254-258.

Bjørnstad, Jan F. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 259-260.

Mayo, Deborah G. Rejoinder: “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 261-266.

Abstract: An essential component of inference based on familiar frequentist notions, such as p-values, significance and confidence levels, is the relevant sampling distribution. This feature results in violations of a principle known as the strong likelihood principle (SLP), the focus of this paper. In particular, if outcomes x and y from experiments E1 and E2 (both with unknown parameter θ), have different probability models f1( . ), f2( . ), then even though f1(xθ) = cf2(yθ) for all θ, outcomes x and ymay have different implications for an inference about θ. Although such violations stem from considering outcomes other than the one observed, we argue, this does not require us to consider experiments other than the one performed to produce the data. David Cox [Ann. Math. Statist. 29 (1958) 357–372] proposes the Weak Conditionality Principle (WCP) to justify restricting the space of relevant repetitions. The WCP says that once it is known which Ei produced the measurement, the assessment should be in terms of the properties of Ei. The surprising upshot of Allan Birnbaum’s [J.Amer.Statist.Assoc.57(1962) 269–306] argument is that the SLP appears to follow from applying the WCP in the case of mixtures, and so uncontroversial a principle as sufficiency (SP). But this would preclude the use of sampling distributions. The goal of this article is to provide a new clarification and critique of Birnbaum’s argument. Although his argument purports that [(WCP and SP), entails SLP], we show how data may violate the SLP while holding both the WCP and SP. Such cases also refute [WCP entails SLP].

Key words: Birnbaumization, likelihood principle (weak and strong), sampling theory, sufficiency, weak conditionality

Regular readers of this blog know that the topic of the “Strong Likelihood Principle (SLP)” has come up quite frequently. Numerous informal discussions of earlier attempts to clarify where Birnbaum’s argument for the SLP goes wrong may be found on this blog. [SEE PARTIAL LIST BELOW.[i]] These mostly stem from my initial paper Mayo (2010) [ii]. I’m grateful for the feedback.

In the months since this paper has been accepted for publication, I’ve been asked, from time to time, to reflect informally on the overall journey: (1) Why was/is the Birnbaum argument so convincing for so long? (Are there points being overlooked, even now?) (2) What would Birnbaum have thought? (3) What is the likely upshot for the future of statistical foundations (if any)?

I’ll try to share some responses over the next week. (Naturally, additional questions are welcome.)

[i] A quick take on the argument may be found in the appendix to: “A Statistical Scientist Meets a Philosopher of Science: A conversation between David Cox and Deborah Mayo (as recorded, June 2011)”

 UPhils and responses



Categories: Birnbaum, Birnbaum Brakes, frequentist/Bayesian, Likelihood Principle, phil/history of stat, Statistics

Post navigation

45 thoughts on “Statistical Science: The Likelihood Principle issue is out…!

  1. I love Martin and Liu’s first paragraph: “Birnbaum’s theorem (Birnbaum,1962) is arguably the most controversial result in statistics. The theorem’s conclusion is that a framework for statistical inference that satisfies two natural conditions, namely,the sufficiency principle (SP) and the conditionality principle (CP), must also satisfy an exclusive condition, the likelihood principle (LP). The controversy lies in the fact that LP excludes all those standard methods taught in Stat 101 courses. Professor Mayo successfully refutes Birnbaum’s claim, showing that violations of LP need not imply violations of SP or CP. The key to Mayo’s argument is a correct formulation of CP; see also Evans (2013). Her demonstration resolves the controversy around Birnbaum and LP, helping to put the statisticians’ house in order.”

    Now all those standard methods taught in STAT 101 courses can be taught without feeling they’re actually “excluded” by Birnbaum’s result. Return from exile!
    The story, of course, is more complicated than it at first appears.

  2. Nicole Jinn

    Wonderful! I am very happy for you!

  3. Sleepy

    Very excited to see this finally come out, especially after seeing your seminar talk.

    • Sleepy (or should I say “wide awake”): Thanks so much! Just a few years in the works….!

  4. anonymous

    I haven’t read everything yet, but it’s hard to tell if the remark on p. 263 of the Rejoinder means Birnbaum dispenses with the sufficiency principle SP in his argument to the SLP. The citation is Birnbaum 1972. Why is SP in his argument at all then, or was this not known in 1962?

    • Anonymous: That question seems to reflect a fairly intimate familiarity with the argument. The answer is surprising. It turns out that Birnbaum knew already before writing the 1962 paper that you could run it w/o sufficiency (SP). He said, in passing, in that paper, that he didn’t bother to interrupt the argument by demonstrating the point because SP was so uncontroversial, why not use it. I’m not sure how familiar this one point is, even to insiders. In my earlier (2010) paper, I had to actually find a place to put SP in because it hardly had a role in the argument, once one allows what he called “mathematical equivalence”. None of this changes the problem with the “proof” which always turned just on WCP.

      • anonymous

        I tried to submit this comment a few times, let’s see if it works. If what you say is true, then my question is, why did Evans, Fraser and Monette (I think it’s 1986) go to the trouble of trying to run the argument without SP rather than use Birnbaum’s result from 1972.

        • Anonymous: Great question, but I don’t know the answer. It might be that Birnbaum was too sketchy in his treatment. Frankly, I find that all of them evoke the central issues under dispute. That is, to get rid of sufficiency and arrive at “mathematical equivalence” really appeals to sufficiency. But I had no bones about using mathematical equivalence. I just didn’t understand why some people failed to see that if there’s a counterexample from (WCP and SP) to SLP, then that IS a counterexample from WCP to SLP. (“Counterexample”, defined in the rejoinder, is used here as philosophers use it: a case where the premises (or conjuncts in the antecedent of the conditional) are made true while the conclusion (or consequent of the conditional) is made false.)

          In other words, removing premise SP does not somehow help the argument to the SLP. If you’ve refuted it with more premises, you’ve refuted it with less. (Because if the argument went through with less premises, it would also go through with more premises.) Yet, this came up again and again in the discussions. That is why I mention this right up front in the abstract.

          all of this is logic, and in fact 99% of the whole Birnbaum business is pure logic–that’s why it made sense to me as a non-statistician logician. However, there’s a job here for a logic-philosopher of language person to capture both the argument and the fallacious instantiation adequately. But this last point was to be my second informal reply to question (1), which I’ll get to.

          • anonymous

            Dawid and Evans make it sound as if there’s a crucial difference between Cox’s WCP and the conditionality principle in Birnbaum’s argument. The Rejoinder accepts there’s a difference? Or does it deny there’s any difference? I’ve read the earlier papers on this blog a long time ago.

            • Anonymous
              They are the same. No one precisely defined the WCP, even though people assumed they knew what it said. (Note the informal defn. Dawid gives, from Birnbaum). I define it in the paper, 4.3. It is an equivalence all right, one simply has to get the right equivalence. I hadn’t stressed this way of formulating it in the earlier paper (2010). The difference enters , not with WCP, but the clever way Birnbaum will use it, or try to. The SLP doesn’t refer to mixtures, but the WCP does. Therefore, Birnbaum needs to introduce (hypothetical) mixtures to try and get the SLP. The failure to grasp this step shortchanges the entire argument; this is what the critical commentaries miss, I argue. It’s best to refer readers to the paper which I think is pretty straightforward.

  5. first informal reply to (1) Why was/is the Birnbaum argument so convincing for so long? (Are there points being overlooked, even now?)
    (a) Starting with “are there points being overlooked, even now”? I will suggest that Dawid and Evans assume the limits of Mayo (2010) apply to this paper.

    The counterexamples (to the argument) in that paper were correct, but it didn’t go far enough to prevent “enough of the SLP” from sneaking through (see my Rejoinder to Evans).

    (b)On the other hand, the earlier paper said more about the linguistic/logical equivocation that is at the heart of my answer to the first part of the question “why was it so convincing for so long”. In the current paper, there is only one sentence, really, and it’s in the rejoinder to Hannig, that identifies the equivocation.

    (c) The second reason is speculative, but I don’t see how one can discount the fact that lots of people wanted the LP* to be true (since it meant that all the evidence is in the likelihoods). When Savage called it a “breakthrough” he predicted that people wouldn’t stop at the half-way house of likelihood after Birnbaum’s result. They would go all the way to Bayesianism. This is nearly a quote, can’t look it up just now.

    So, they didn’t look that closely at some finagling in the argument, they kept to the fuzzy Ev function, and left details to the reader (in some texts).

  6. Savage wrote: “…I suspect that once the likelihood principle is widely recognized, people will not long stop at that halfway house but will go forward and accept the implications of personalistic probability for statistics”.

    • Corey: right….so I was correct, yes?
      I think the use of “halfway” house is funny because it often alludes to a drug rehab facility, or a house between jail and returning to society.

      • Mayo: Yup, correct. I just happened to have that quote on hand, so I provided the exact wording.

        • Thanks. Savage kept trying to encourage Birnbaum to leave the halfway house, and eventually he did. Only instead of leaving it (likelihoodism) to move to Bayesianism as Savage wanted, he moved out of likelihoodism into error statistics (by means of his confidence concept). Strange history.

          • I think I read a bit about Birnbaum’s Confidence concept (probably here) and I seem to recall that it seemed rather at odds with the severity… maybe compatible, but not concerned with severity as such? Don’t quite remember…

            • Corey: It’s defined in my rejoinder. Essentially in terms of error probs, and it was never clear if he’d use the actual data in the computation (again in the rejoinder, very briefly). Fraser and others have closely related measures to (conf), and I guess the confidence dist people you’ve mentioned. Are you familiar with Martin and Liu, or Hannig?

  7. On question (2): What would Birnbaum have thought? It may seem surprising but (as I hope my rejoinder brings out) I take it that Birnbaum would have agreed with me. However, he would have pursued the issue very carefully as to whether I can escape “too much” of the (strong) likelihood principle sneaking in (even acknowledging its violations). My disproof was designed to ensure this. Dawid mentions the issue, but he also assumes the original proof is sound.

    As for his general take on contemporary statistical foundations, i think he might be rather surprised that the evidential construal he sought for N-P theory (along the lines of his Conf) still escapes pinning down. Would he accept “severity” as the way to render error probabilities relevant for inference (in the individual case)? Hmm…..I guess we’ll have to ask him on New Year’s.

  8. I need to find time to read through the paper and the rejoinders. However, I have two points. (Apologies if they were already covered in the paper and its discussion.)

    First, without either defending the likelihood principle or Birnbaum’s proof or for that matter Mayo’s refutation, it seems to me that to adopt the likelihood principle does not require a proof that it follows from sufficiency and conditionality. One could, for example, choose to regard it as an appealing primitive principle. Barnard refers to himself as having ‘spent the decade of the 1950s preaching the gospel of likelihood’,(1) so well before Birnbaum’s paper.

    One further point is that what one conclude as a practical statistician from Cox’s famous mixture experiment is a matter for debate. Some take it as being self-evident that what one should do is maintain the type I error rate for each component of the mixture. Others think that what should happen is that likelihood ratio should be maintained. The point is that what is the conditional test and what is the unconditional test varies according to the criterion that is the focus: type I error or likelihood. I think that this point is sometimes overlooked.

    Barnard, G. A. (1996). “Fragments of a statistical autobiography.” Student 1: 257-268.

    • Christian Hennig

      Stephen: “Appealing primitivity” of a principle is fine only until something is revealed that makes the intuition behind the “appealingness” look flawed. Isn’t the fact that looking at error probabilities is dismissed by the principle such a flaw, and probably a fatal one?

      • Christian: It doesn’t matter (for my point). Although as Cox said, it’s pretty suspicious if such a radical result could stem from such innocent beginnings. But the suspicions needed to be demonstrated.

    • Stephen On your last point, I start out (in the paper) describing how Lehmann explains that an N-P test could opt for the conditional or the unconditional approach. This also enters in Lehmann’s “one theory or two” paper. Personally, I wish Lehmann had nipped things in the bud back then, rather than say it all depends on context. (e.g.,noting that in considering a series of repetitions, the average error probs could well be of interest.) On the other hand, what’s great about his saying this is that it’s abundently clear that the unconditional treatment can’t possibly be deemed equivalent to the conditional treatment. So the “equivalence” stated by the WCP cannot be THAT. In effect, Dawid sounds like he is merely pointing out to me this inequivalence–an elementary point I never disagreed with. In fact, I see one of my major contributions to be pinpointing the correct equivalence asserted by the WCP. One doesn’t find it in Evans either so far as I can tell

  9. Christian, I agree. I have argued elsewhere that the analogous problem of axiom worship can lead to problems. Sometimes it is appropriate to reject (at least one of)some axioms because the consequences are unacceptable.

    However, see my second point. The likelihood principle might be a guide rather than an absolute but it is far from clear to me, for example, that stopping rule inconsistencies necessarily call for one to abandon the LP rather than (say) to question the Neyman-Pearson framework. Even staying with the frequentist framework, there is much controversy over what one should and should not condition on.

    • Stephen and Christian: First, thanks for your comments–I was hoping to get some on the general issue.
      My concern was and is with an alleged “theorem”–being a theorem has a definition–and the alleged proof of it.. Feeling it in your bones is not a proof. Birnbaum thought he had a proof and my denial of the proof does not involve assuming error control (nor rejecting it). By the way, it’s standard to distinguish the law of likelihood and the likelihood principle.
      On Barnard, he rejected the SLP in cases with non-point alternatives and optional stopping.(e.g., Savage forum)

      Finally, of course one may embrace the SLP without a proof of it. A direct proof follows from inference by way of Bayes’ theorem. The big deal about Birnbaum’s result is that it purported to find a proof that did not require starting from Bayes’ theorem. That’s why Savage called it a “breakthrough”, and that’s why I’ve had to break through the breakthrough.

    • > much controversy over what one should and should not condition
      >. on.

      Yes, I fully agree and Evans made that point in his comment.

      What I believe we have here is failure of communication between Dawid, Evans and Mayo, but I doubt I am the one to help sort that out.

      • Keith: Although there’s controversy about what one should condition on, I really don’t see that it has anything at all to do with THIS one issue which deals solely with whether Birnbaum’s attempted proof is kosher. For that purpose, finding counterexamples to [(SP and WCP) entails SLP] is important, but even that doesn’t go far enough. That’s one of the main reasons I wrote this paper (because the 2010 paper didn’t go far enough). Evans is more or less at the point of the 2010 paper. The counterexamples go over into a violation of the equivalence relation that he associates with WCP. Incidentally, they were not presented as counterexamples in Evans, Fraser, and Monette (1986 I think). Instead, they reformulated the principles to avoid the result.

        I presented a version of my argument in a very short paper in Madrid a few years ago, I guess 2012 (no it was December 2011, see link*). It turned out that Evans was in the audience (I had never met him). After we communicated, he became convinced that I was right to find that there were counterexamples after all, and he formulated them in terms of the violations to SLP as an equivalence relation. He gave me an opportunity to write the paper with him–which I appreciated. For various reasons I did not. My point is that it isn’t as if we haven’t communicated, Evans and I at least. With Dawid, I think he may just be assuming that I can’t have a new take on a problem that so many have grappled with over the years. He’s overlooking the fact that even when a conditional is a theorem, detaching the consequent requires a distinct step (i.e., accepting the antecedent). In any of the way one tries to do this, I show, you get an unsound or circular argument. Well it’s in the paper.

  10. Last informal reply to (1): The Birnbaum argument, and the fallacious instantiation, is still in need of a fully adequate logical rendering. It doesn’t readily translate into quantified logic, and my feeling has been that a logic-philosopher of language person would be in the best position to achieve this. The issue with the argument isn’t really a statistical one, but a logical one. While I demonstrate the flaw in the argument, an even neater rendering (employing techniques from phil lang would be useful).

  11. As I say, I need to look at the paper and the comments in depth. However, one further point occurs to me. In Cox’s famous mixture experiment by conditioning you can either choose to fix the type I error rate for each sub-experiment or the likelihood criterion for each sub-experiment and these are opposing strategies. (I note by the by that although the strategy of increasing the type I error rate beyond 5% for the less precise sub-experiment seems offensive, that of reducing it for the more precise one seems less so and may even be supported by Mayo’s severity approach. For example, one could envisage a scientist who had a rule: “I will always guarantee, using frequentist approaches, that my type I error rate is no more than 5% but when I have abundant precision and hence very good power, I will take the opportunity to reduce the rate further.)

    To return to the two approaches, the fact that they start by treating a fundamental situation in opposite ways is at least a psychological reason for finding “Birnbaum’s theorem” puzzling.

    • Stephen: I don’t recognize any of these issues in relation to the Birnbaum argument, or the treatment of actual or hypothetical theta-irrelevant mixtures. As I say, the issue is really just a logical one.

  12. Agreed regards the criticism of Birnbaum argument. I was picking up on the previous discussion with Christian which had drifted rather off topic (my fault).

    However, I am not sure that I agree regarding the practical implications of the mixture experiment. We still need to know what to do and I maintain it’s not clear. Disproving Birnbaum’s theorem would simply show that it should not be at all as obvious as Savage thought it ought to be. In other words it would restore Cox’s experiment to being a genuine conundrum rather than one that Savage thought Birnbaum had solved, or do you disagree?

    • Stephen: I never had a reason to think that Birnbaum saw himself as solving the “conundrum” posed by Cox’s example with “two instruments with different precisions”. It’s not that Savage thought Birnbaum had solved this problem, it’s rather that Savage thought (thanks to Birnbaum”s breakthrough”) that in order for a frequentist to solve this problem she has to embrace the SLP, and he hoped she wouldn’t stop at the halfway house.

  13. anonymous

    Is it possible to hold the law of likelihood and reject the likelihood principle? Can you reject the likelihood principle but still retain the law? doesn’t one entail the other?

    • Michael Lew

      There is nothing that prevents one from complying with the law of likelihood even if the likelihood principle may not be entirely or universally true. However, in such a case it is unlikely that one could prove that the law is true in any sense other than providing sensible guidance for interpretation of likelihood functions.

      I think that it is important to note that there are serious restrictions that need to be placed on pairs of likelihoods for the law of likelihood to apply. They need to represent mutually exclusive simple hypotheses that are within a single statistical model. If they can be represented as points on a single likelihood function then the law of likelihood can be applied.

      (Parenthetically, I wonder whether the likelihood principle is sound and provable with those same restrictions.)

      I agree with Stephen Senn that even if the likelihood principle is not proven mathematically (I should say, even if the disproof of the proof is sound) that is no impediment to us simply taking it as a good normative principle. After all, the repeated sampling principle cannot be proved to be anything other than a sensible sounding normative principle.

      • Michael: Mutually exclusive simple hypotheses within a given model is such an artificial situation as not to be very relevant to general statistical inference. Besides, even then, at most one can set upper bounds to error rates (there are also intermediate cases discussed in Mayo and Kruse)–provided the avoidance of selection effects. Birnbaum and others are led to restricting it to that artificial example to avoid terrible error probabilities, so that too grows out of a desire to control error probabilities. All that said, sure one can hold it. I guess Royall is one of the few who would claim it to be all of inference. I’m not sure what Gandenberg’s view of this is–he might agree with Royall I don’t know.
        But even after all those sacrifices, of artificiality and limited error control, the account is unable to accomplish a main goal: distinguishing the evidential warrant of two pieces of data for the same hypothesis.

        • Michael Lew

          “Artificial”? “Not very relevant”? Rubbish!

          Any simple hypotheses regarding the parameter values of, for instance, z-tests (your favourites) or t-tests (my favourites) are mutually exclusive. The vast majority of statistical tests fit within the restrictions. Scarcely artificial or irrelevant, and if you think that they are then we are not communicating effectively.

          What is your opinion of the possibility that the likelihood principle is a sensible normative principle?

          • Michael Lew: Mutually exclusive but not collectively exhaustive. The idea of point against point is scarcely ever exhaustive; I don’t think it’s a matter of communication.
            “What is your opinion of the possibility that the likelihood principle is a sensible normative principle?”

            if you mean this modally, I spoze there’s a possible world in which the LP (which I’m taking to mean the strong LP) is a sensible normative principle. Not this world, though. Not the world where the goal is to learn with the threat of error.

            I will put up a post on the law of likelihood shortly.

            Just to reiterate, the issue here was invalidating an alleged proof of the SLP; I and many others rejected it way before even considering the long-lived arguments by Birnbaum.

    • Anonymous: It seems possible to hold either one without the other.

      • Michael Lew

        When is it necessary to compare exhaustive hypotheses in a manner that cannot be accomplished with a likelihood function? You can compare each point to every other point. That would be exhaustive, surely.

        • Dear Michael,

          If it is defined just the null hypothesis H, as Fisher did, then the negation of H is not defined. The negation of H may contain much more statements than the probability theory can hold (or at least much more statements than the postulated model can treat).

          You are approaching the problem by using a restricted family of probability measures, and your hypotheses (H and not-H) will be collectively exhaustive GIVEN that your restricted universe of possible measures occurs. That is basically the core of the endless discussion between Fisher and Neyman.

  14. david mogilner

    A moral upright world is in debt to you Mayo.

  15. Pingback: 60 Years of Cox’s (1958) Chestnut: Excerpt from Excursion 3 Tour II (Mayo 2018, CUP) | Error Statistics Philosophy

  16. Pingback: Midnight With Birnbaum (Happy New Year 2018) | Error Statistics Philosophy

  17. Pingback: Midnight With Birnbaum (Happy New Year 2019)! | Error Statistics Philosophy

  18. Pingback: Cox’s (1958) Chestnut: You should not get credit (or blame) for something you didn’t do | Error Statistics Philosophy

Blog at