Putting the brakes on the breakthrough: An informal look at the argument for the Likelihood Principle

.

Friday, May 2, 2014, I will attempt to present my critical analysis of the Birnbaum argument for the (strong) Likelihood Principle, so as to be accessible to a general philosophy audience (flyer below). Can it be done? I don’t know yet, this is a first. It will consist of:

• Example 1: Trying and Trying Again: Optional stopping
• Example 2: Two instruments with different precisions
[you shouldn’t get credit (or blame) for something you didn’t do]
• The Breakthough: Birnbaumization
• Imaginary dialogue with Allan Birnbaum

The full paper is here. My discussion takes several pieces a reader can explore further by searching this blog (e.g., under SLP, brakes e.g., here, Birnbaum, optional stopping). I will post slides afterwards.

23 thoughts on “Putting the brakes on the breakthrough: An informal look at the argument for the Likelihood Principle”

1. Sander Greenland

I’d be interested in seeing more on relations between your results and other critiques of Birnbaum’s (non)theorem, especially since some expressed doubts about his nontheorem early on (as seen in the comments accompanying the original paper), and attempted formalizations have only reinforced such doubts, e.g., Evans at
http://arxiv.org/abs/1302.5468

In parallel, stepping away from theoretical treatises like Berger & Wolpert, Casella & Berger, or Royall, few applied Bayesian statistics books I see mention the nontheorem (although Birnbaum 1962 is sometimes cited for the likelihood principle), the offered motivations for Bayes instead being coherency and good field performance of the methods, e.g., see Box & Tiao, Carlin & Louis, Gelman et al., or Spiegelhalter et al. Leonard & Hsu’s book mentions the nontheorem, but skeptically in passing; elsewhere, Leonard refers to it as refuted:
http://www.statslife.org.uk/opinion/1358-a-personal-history-of-bayesian-statistics
Thus, fortunately, it seems the nontheorem has played an indiscernible role (if any) in applications of Bayesian statistics.

• Sander: Greetings. For more on relations between other criticisms please see the paper. I didn’t think the LP was necessarily an argument for Bayesianism. Savage thought people wouldn’t stop at the halfway house once they’d seen Birnbaum, and he kept trying to convince Birnbaum to no avail. My original critique, which led Evans to decide there was a logical problem, was published in 2010, but my current paper goes considerably beyond that. My rejoinder will respond to 6 discussants.

Many texts cite Birn 1962 as a theorem, Casella and R. Berger leave it as an exercise for the reader now, so I hope they will get full credit. Leonard became convinced from my blog which has had extensive discussions of it. Gee you have been away awhile…..

• Sander:

Thanks for the Leonard link – some interesting leads on how Ramsay worked out how CS Peirce was not necessarily anti Bayesian but rather anti information from no experience rather than experience (slide 20) which to me seems more consistent with Peirce’s views. (Priors informed by past experience, if checkable, should be acceptable.)

But also interested in what impact Mayo and Evan’s criticisms of SLP will have in theory if not practice.

Evan’s: “Technically it would seem, however, that the proof itself is sound … [with adequate clarification of theorem it] renders the result almost vacuous”

Mayo: “purports that [(WCP and SP), entails SLP], we show how data may violate the SLP while holding both the WCP and SP. Such cases also refute [WCP entails SLP].”

Great that there will be six commenters to help sort this out!

• Sander Greenland

Indeed, it does look like a contradiction, one hopefully resolved by the forthcoming journal discussion. In the meantime I would hazard that it is traceable to an inconsistency between Evans’ formalization and Mayo’s structure for the problem (which on first read struck me as closer to Birnbaum’s). In writing my initial post, I was hoping Mayo would clarify the source of such apparent divergences, as I did not see where she discussed it in her Stat Sci paper – of Evans (2013) it said only “For an alternative criticism see Evans (2013).”

• Sander: Well, (a) I’ve given the informal presentation to a mixed audience (around 1/3 from our seminar on philstat) and it seemed to go very well! There were a few statisticians who more or less said that they violate the LP and it’s not a big deal. (b) I will be interested to hear what people think of the discussion when it comes out. I think Evans now concurs, after my Mayo 2010 treatment, that the “theorem” is not formally correct, because the CP (WCP) is not an equivalence relation. Well there’s more I can’t get into now.

2. I think we are up to stage 3:
https://errorstatistics.com/2011/12/22/the-3-stages-of-the-acceptance-of-novel-truths/

Default Bayesians should, in a way, be glad, since they’ve been violating the LP.

• Sander Greenland

With apologies, I’m not clear how the “3 stages” applies because I’m not sure what the novel truth at issue is. True, Birnbaum’s nontheorem was offered as a novel truth by some, but never accepted by all (and not even known to most practitioners), plus it now seems clear that it wasn’t one after all (at least not in any useful sense). So it looks like the reverse situation to me (the initial skepticism was justified, not mere resistance to novelty).

Regardless, I am always glad to see violations of formal principles when the research context shows them to be unwise or inapplicable for that context. The strong repeated sampling (SRS, or calibration) principle looks as dubious as the SLP and Bayesian coherence when the observational setting does not support (and often clearly violates) any sampling model available for calibration or data processing. Some of Birnbaum’s discussants seemed to have this view in mind, as does Leonard.

• I would be interested in Sander’s view on my argument as to why frequentist meta-analysts do not need to take account of optional stopping (in individual trials) and so will take this opportunity to link to my previous blog on this site on this subject.

suggests that if you weight trials equally there is a problem but if you weight them by actual information there is not.

• Sander: Your comment is a perfect exemplification of what I meant by the 3 stages! Thank you for instantiating my thesis.

3. I’ve defended Birnbaum’s theorem here (Section 4). However, I agree with Deborah that Birnbaum’s theorem does not work if you have something like Cox and Hinkley’s Strong Repeated Sampling Principle in mind and you’re thinking of the conditionality principle as telling you that “repeated sampling” should mean “repeated sampling from the same component experiment.” It only works if the principle is taken to be what Birnbaum said it was, namely a sufficient condition for experimental outcomes to be evidentially equivalent. As such, I think it is rather compelling. However, the claim that our methods ought to track evidential equivalence in the sense given by the Likelihood Principle is not a trivial one even if one accepts that the Likelihood Principle is true.

• Greg: I don’t get your comment at all. The mathematical/logical problem with Birnbaum’s “proof” has nothing whatever to do with Cox’s repeated sampling principle or anyone else’s. What can you mean?
“It only works if the principle is taken to be what Birnbaum said it was, namely a sufficient condition for experimental outcomes to be evidentially equivalent.”

What? I don’t have a clue what you mean by “it works?” It holds when it holds, is about all one can say…

• Thanks for the request for clarification. As I understand your critique, it relies on the fact that a sampling theorist needs to understand the conditionality principle as a claim about which sampling distribution to use for inference–hence the connection to repeated sampling principles. But someone not already committed to sampling theory need not understand it that way, and Birnbaum doesn’t formulate it that way. When it (together with the sufficiency principle and the likelihood principle) are formulated as sufficient conditions for experimental outcomes to be evidentially equivalent, the proof is valid. Moreover, it’s premises seem to me compelling. But they don’t have *immediate* implications for statistical practice–further arguments are needed, and I think some caveats and restrictions will be in order.

I lay out this argument in some detail in the paper (preprint here).

• Greg: No, my argument doesn’t rely on anything regarding a sampling theorist. Not in the least. Now it’s true that Birnbaum was largely interested in the relevance of his argument for a sampling theorist. That’s what he said and intended, and what troubled him. (He knew obviously that the LP already followed for Bayesians and likelihoodists.) If you read my forthcoming paper, you’ll see my critique is rather different from my earlier discussion, and my rejoinder goes further still. There are some new twists in Birnbaum that I only discovered in the past year. So, I’ll just put this matter aside until it appears. Thanks for your interest.

• Thanks for continuing the discussion. I agree that Birnbaum’s proof is not the straightforward refutation of the use of frequentist methods that it is often taken to be. However, my reasons for this claim are different.

In the forthcoming paper, you characterize WCP as a directive to condition on the component experiment that produced the result (p. 11). That might be what Cox had in mind, but it’s not the WCP as Birnbaum stated it. Birnbaum stated the WCP as a sufficient condition for evidential equivalence. It’s not a directive at all.

I agree that Cox’s WCP doesn’t allow the derivation of the SLP. That’s an important point because it gives frequentists a way out of the dilemma between not conditioning and accepting the SLP.

However, Birnbaum’s WCP does allow the derivation of the SLP. You might reject Birnbaum’s WCP while accepting Cox’s, but of course to do so is to reject the soundness of Birnbaum’s proof rather than its validity.

Of course, if Birnbaum’s WCP isn’t a directive, then his SLP must not be a directive either. It’s a sufficient condition for evidential equivalence as well. For that reason, I agree with you that the implications of the SLP are not as straightforward as they are often taken to be.

• Why don’t you state the WCP as you see it Greg? I use Birnbaum’s defn word for word from 5 papers. Then we can see who is making up a definition and who is dealing with Birnbaum’s defn. It is an equivalence. Of course in saying A = B, one might see it as implying a directive (don’t equate A with C if C isn’t = B). Perhaps you haven’t read my new paper.

• You formulate the WCP as a directive on p. 10 of the version linked above: “condition on the E_i producing the results… Do not use the unconditional formulation.” That’s not Birnbaum’s WCP.

• john byrd

What is Birnbaum’s WCP? How does it differ from Cox and why should anyone place any credence in the Birnbaum definition when Cox is what most of us know? Thanks,

• John: It is the identical definition:
Birnbaum (1962, 1969):
“The evidential meaning of any outcome of any mixture experiment is the same as that of the corresponding outcome of the corresponding component experiment, ignoring the over-all structure of the mixture experiment.” If someone doesn’t want to see “ignoring” as a directive for what to ignore, they need not. Greg hasn’t given his definition yet.

• I follow Birnbaum, as I understand him, in formulating the WCP as a sufficient condition for evidential equivalence: an outcome x of a component E* of a mixture experiment E is evidentially equivalent to outcome x of experiment E*. I take the phrase “ignoring the over-all structure of the mixture experiment” in Birnbaum’s formulation just to indicate that he’s talking about (E*,x) rather than (E,(E*,x)), and not as a directive to the statistician to use the sampling distribution that ignores the over-all structure of the mixture experiment.

Cox (1958) does talk about the sampling distribution to be used for inference, saying that it “should be taken to consist, so far as is possible, of observations similar to the observed set S, in all respects which do not give a basis for discrimination between the possible values of the unknown parameter θ of interest” (361). That is, it should be conditional on a maximal ancillary statistic.

I agree with Deborah that Cox’s WCP does not entail the SLP, even in conjunction with the sufficiency principle. I don’t agree that Birnbaum’s WCP is identical. Birnbaum’s WCP says nothing about sampling distributions; it’s agnostic about whether sampling distributions are to be used or not, as it must be in order to provide an argument against the use of sampling distributions that might be convincing to a sampling theorist.

The two principles are related as follows: Cox’s WCP tells the sampling theorist what he or she needs to do in order to conform to Birnbaum’s WCP. In this sense, Birnbaum’s WCP is more fundamental than Cox’s.

Denying Birnbaum’s WCP means accepting that the evidential meaning of an experimental outcome depends on whether or not there was a chance that some other experiment could have been performed. That view strikes me as very strange. It also raises seemingly intractable difficulties, given that it’s virtually always the case (plausibly) that there was a chance that some other experiment could have been performed.

• Greg: My defn has nothing to do with a sampling theorist. I know that people try to wriggle out of my disproof this way, but they are wrong. Once again: my defn is Birnbaum’s. He didn’t change it. Nor did I. Neither the counterexamples nor the demonstration of unsoundness can be gotten around.

4. Well, I’ve explained why I disagree. Thanks for your paper, which I do see as making an important contribution to our understanding of the relationship between sampling theory and the SLP.

5. Christian Hennig

I guess we’ve been here before… 😉

• Christian: Yes,in this case as an upshot of my informal rendition on May 2–which seemed to work. Once the Stat Sci rejoinder is out, interested people might ponder it again, and then let it go. Will texts modify their claims about theorem hood, I’ve been asked? I know of one that will. We’ll see.

This site uses Akismet to reduce spam. Learn how your comment data is processed.