In my latest formulation of the controversial Birnbaum argument for the strong likelihood principle (SLP), I introduce a new symbol to represent a function from a given experiment-outcome pair, (E,**z**) to a generic inference implication. This should clarify my argument (see my new paper).

(E,**z**) Infr_{E}(**z**) is to be read “the inference implication from outcome **z** in experiment E” (according to whatever inference type/school is being discussed).

*A draft of my slides for the Joint Statistical Meetings JSM in Montreal next week are right after the abstract. Comments are very welcome.*

Interested readers may search this blog for quite a lot of discussion of the SLP (e.g., here and here) including links to the central papers, “U-Phils” by others (e.g., here, here, and here), and amusing notes (e.g., Don’t Birnbaumize that experiment my friend, and Midnight with Birnbaum).

**On the Birnbaum Argument for the Strong Likelihood Principle**

**Abstract**

An essential component of inference based on familiar frequentist notions p-values, significance and confidence levels, is the relevant sampling distribution (hence the term *sampling theory*). This feature results in violations of a principle known as the *strong likelihood principle* (SLP), the focus of this paper. In particular, if outcomes **x*** and **y*** from experiments E_{1} and E_{2} (both with unknown parameter θ), have different probability models f_{1}, f_{2}, then even though f_{1}(**x***; θ) = cf_{2}(**y***; θ) for all θ, outcomes **x*** and** y*** may have different implications for an inference about θ**. **Although such violations stem from considering outcomes other than the one observed, we argue, this does not require us to consider experiments other than the one performed to produce the data. David Cox (1958) proposes the Weak Conditionality Principle (WCP) to justify restricting the space of relevant repetitions. The WCP says that once it is known which E_{i }produced the measurement, the assessment should be in terms of the properties of the particular E_{i}.

The surprising upshot of Allan Birnbaum’s (1962) argument is that the SLP appears to follow from applying the WCP in the case of mixtures, and so uncontroversial a principle as sufficiency (SP). But this would preclude the use of sampling distributions. The goal of this article is to provide a new clarification and critique of Birnbaum’s argument. Although his argument purports that [(WCP and SP) entails SLP], we show how data may violate the SLP while holding both the WCP and SP. Such cases directly refute [WCP entails SLP].

Comments, questions, errors are welcome.

Full paper can be found here: http://arxiv-web3.library.cornell.edu/abs/1302.7021

This is so interesting. Thank you.

Fran: You’re very welcome!

Incidentally, this reminds me a story that I read about

Paul Erdösand The Monty Hall problem. This is the problem for reference:Paul Erdös got it wrong at first and he allegedly said that it should not make any difference to switch the box. I have the feeling that the reason behind Paul Erdös being unable to understand the Monty Hall problem the first time he was told is the

similarto the reason behind people being unable to see why the SLP is wrong…That is,

they focus on the given data and not in how that data came to be given, and just like most people only see two doors in the Monty Hall problem and think p=1/2, those supporting SLP only seexand not how thatxcame to be.Fran, I agree 100%.

Deborah; I agree with your points. I see that you are speaking at the JSM. I will try to attend although, as president of the Statistical Society of Canada this year, I seem to have a pretty full calendar at the meeting. In any case please let me know the time of your

talk.

I think I sent you this before but if not please see arXiv:1302.5468. For me that is the ultimate way to get rid of Birnbaum’s Theorem because it formalizes things very clearly mathematically. I wish I had seen this

back in 1986 because one of the things that paper contributed was to formalize the principles as relations. So it was only a small step away. I don’t see this as conflicting in any way with what you have written and I think we are making different counterarguments.

Prof. Evans: I had a peek at that draft recently; may I just say that your formalization of statistical principles as equivalence relations on inference bases strikes me as clean and elegant — true mathematical beauty. I wonder what class of statistical principles is selected by a requirement that inferences be invariant to the order in which subsets of a data set are processed…

Michael:

I’m glad we got to talk about this when I gave the Madrid paper in 2011 based on my 2010 paper. I wanted to find a way to bring out the logical problem that has bedeviled this argument for 50 years by articulating the WCP (as asserting an equivalence as well as an inequivalence) while demonstrating specific counterexamples to the SLP. I think my new approach accomplishes this. (We should connect our logical/set-theoretic treatments; I regret not having the time when we first spoke about pursuing that.) True, Evans, Fraser, and Monette was “a small step away”, and as many times as I’ve reread that great paper, I never understood why it didn’t go all the way.

Mayo: You know that I disagree with you in one respect, although I agree in most others. I think the key sentence here that highlights the reasons for disagreement is “Infr_{E_B}[y^*] can’t change meaning within the argument!”

A possible mathematical proof of the SLP would define “Infr_{E_B}[y^*]” as some very general mathematical object without caring one bit about its meaning. With that, the proof goes through (although for this one needs slightly different, more “reductionist” notation than yours). There is no particular “meaning” that would have to change.

Only if one actually tries to give a meaning to what is proved, your argument does its work, and here I agree with you: For the reason given by you, the SLP doesn’t have a *meaning* relevant to frequentists and probably statisticians as a whole when it comes to questions like “what procedures do we use”.

You may well be right, though, in claiming that your argument disproves what Birnbaum attempted to show, because Birnbaum was probably interested in the mathematical statement only for its meaning.

Christian: Perhaps the word “meaning” seems to make this point deeper than the obvious logical one: any plugging in for terms in a given formula must be consistent. Say I write x = x (a theorem in logic). If I plug in “Popper” for the first instance of x, and “Mayo” for the second instance of x, then I’ve obtained a false statement. How can a false statement come from plugging into a theorem? Answer: By inconsistent plugging.

Mayo: Perhaps we really should stop discussing this because my objection really seems to be minor and formalist compared to the issues that people like C. Robert have with it.

Still I insist that it’s not the case that I get the basic logic wrong. Obviously, if you have the theorem mathematically proven with a general mathematical object Infr_{E_B}[y^*] (regardless of whether this is of statistical interest or not), it cannot be “Mayo” on one side of the equation and “Popper” on the other. This actually means that the objects you are talking about, which are those of statistical interest, and which require a change of meaning along the way for the theorem to hold, as you correctly point out, are *not covered* by the purely mathematical version of the theorem (because the WCP and the SP cannot be used for them *in the way required for making the theorem hold* for *all* involved experiments including the mixture). The mathematical theorem still is not empty, because objects Infr_{E_B}[y^*] can be found for which all assuptions and the conclusions hold (namely, boringly, a constant function, but also all objects fulfilling the SLP).

The problem with the theorem is therefore neither that it’d be wrong, nor that it’d be empty, but that its setup doesn’t reflect statistical meaning properly (if it did, you’d have Mayo=Popper ;-) ), and therefore it doesn’t show what Birnbaum intended to show and what many believe he indeed showed.

Christian: There are always several routes for criticizing an unsound argument. (Here the inference is to the SLP).One may show that the argument, if rendered so as to have true premises, is deductively invalid. One may show that the argument, if rendered valid, cannot have all true premises–save when there can be no SLP violation. That shows, if sound, then it is circular. A related route is to provide a counterexample: The SLP is a universal generalization. Any case where SP and WCP holds while SLP does not hold is a counterexample. I have demonstrated the criticism in all these ways, and a few others. I have spent a very, very long time trying to find the least equivocal ways to accomplish this task. In my latest paper, I think the analysis is the clearest. (If it was standard to use quantifiers explicitly in math, the error would have been easier to spot or avoid.) It’s actually a lovely little puzzle for a logically inclined philosopher (like me).

“There are always several routes for criticizing an unsound argument.” That probably hits the nail on its head. From my point of view, the difference between your view and the view of those who ignore your objections insisting that Birnbaum’s theorem is mathematically valid is that the “unsound” part of the argument can be located in several places. My point is that it can be located outside the mathematical part of the argument but rather in what I’d call the “mathematically modelling” part (formalising what is of interest in such a way that mathematics can deal with it). If this is done, the pure mathematical theorem stands, but it is much more narrow than what Birnbaum intended, what would be required for the usual anti-frequentist interpretation, and what you use in order to construct your counterargument.

Your formalism makes the problem formally visible, but Birnbaum’s formalism was rougher, and a number of people (including P Dawid, for example) would stick to the mathematical proof using the rougher formalisation (even then Birnbaum’s proof had to be polished a bit by later authors), so that the problem is not in the theorem but rather in the interpretation of its premises, so that one could say “the problem is not that SP and WCP don’t imply the SLP, but rather that it isn’t reasonable to demand the SP and the WCP both to hold *in the way required for the mathematical proof*” – and not because they contradict each other but rather because one of them is an unreasonable demand in the given generality. And because the problem can be shifted around, this could be either one of the two. As has been argued by some of the earlier critical work, Kalbfleisch, Evans et al…

This is probably a key thing that is useful to understand. There is a problem in Birnbaum’s argument, but one can formalise things in various ways and depending on how this is done, the problem could show up in different places, allowing people who don’t want to see it to claim that it is not visible at any specific point where you try to nail it down. Although looking at the overall picture, one should find it *somewhere*.

Christian H: No, you don’t get my point. Each of the ways I listed of criticizing an unsound argument always exists (we give our logic students the choice) and are logically equivalent. One man’s modes ponens is another woman’s modus tollens, and some like reductio ad absurdums. There’s no way to argue with a counterexample.

You wrote in an earlier comment, “The mathematical theorem still is not empty, because objects can be found for which all assuptions and the conclusions hold (namely, boringly, a constant function, but also all objects fulfilling the SLP).” Yes, that’s the variation on the disproof that shows it to be circular. Circular arguments are valid: A implies A is a theorem, but it doesn’t let you detach A. And if allowed to reason as if it did, I can also prove not-A (by assuming not-A).

As circular as all mathematical equivalence statements are. Still they are not necessarily obvious. What Birnbaum’s theorem (rough mathematical version) assumed was not obviously already the SLP; it required a theorem to show that it was.

Christian H: No, you’re confusing two things. It’s one thing to show A equiv B, and quite another to detach B. Birnbaum alleges (SP + WCP) entails SLP, and that’s what the counterexamples show to be incorrect. That is, he claims we detach SLP once SP and WCP are accepted. Recall he claims to show this for non-Bayesian, frequentist accounts. See the Birnbaum citation in the footnote in my post from yesterday.

I can only come back to my earlier post; one can shift the problem around. One can say that he indeed shows that (SP+WCP) entail SLP (and that they are actually equivalent) in what I’d call a specific and (too) rough mathematical formalisation of SP and WCP. And then your argument does not imply that this reasoning is wrong, but rather that the rough formalisation of SP and WCP as required for the theorem doesn’t properly formalise what the content of them taken as sensible principles (and implying what he claims they imply) should be.

Of course I see that in your formalisation WCP and SP do not entail SLP, and I agree that this is a sensible formalisation. But the original one was rougher and therefore it is legitimate to say that the problem is not in the mathematical statement and proof but in the formalisation of the principles (and therefore in the *interpretation* of the theorem).

By the way, one reason why I bring this up again is that I think that it is a pity that some people don’t engange with what’s the valid point of your argument because they think: “The mathematical proof is fine so why bother?” And I’d give them that the proof is fine for proving *something* that is mathematically correct, but it doesn’t have the implications they believe it has.

Christian H: I think it’s a pity as well. So I’m very grateful for this, thanks.

Christian: It can be any parametric inference implication in any school. That is how Birnbaum understood it, and obviously what needs to be the subject for an argument purporting to hold for arbitrary parametric inferences. It couldn’t be, say, a recipe for baking a cherry pie. He gives examples such as a p-value, an estimate, a posterior probability. In short, the subject matter concerns parametric inference from an outcome z from experiment E within a given statistical model. That is what Birnbaum says.

Now in addition we only need non-contradiction. If the same term is used in different ways in an argument with identity, it’s easy to get anything equals anything, and thus contradictions, 1 = 1, and 1 = 4, say. Further, anything follows from a contradiction X and not-X, since it is a tautology that:

If X then (not-X then C).

Two applications of modes ponens yields C. We can just as easily derive not-C.

I don’t think the famous “breakthrough” was intending to give us an argument from a contradiction.

I’ve made some corrections and was forced to condense the slides to make the presentation shorter. Christian Robert made some remarks on his blog. http://xianblog.wordpress.com/2013/07/31/deborah-mayos-talk-in-montreal-jsm-2013/#respond

Here’s a response, but I’d be grateful for additional ways to clarify the issue to him; we’ve discussed this before on this blog.

Christian: Something you say suggests a possible way to get to the bottom of things. You say “it is possible to choose the experiment index at random, 1 versus 2, and then, if y* is observed, to report (E1,x*) as a sufficient statistic”. I grant that the “performance” of Birnbaumization is doable in my paper. The question remains why, in interpreting the inference implication from y*, y* ought to be reported as x* and the unconditional inference computed. Imagine y* was the statistically significant outcome from the optional stopping experiment. To report it as x* is to report it came from the fixed sample size SLP pair. What justifies doing this? That is the question for you.

Sampling theorists deny that the information about the stopping rule is irrelevant for the proper inference from y*. So why should she report y* as x*? You say x* is sufficient, but that is to assume the stopping rule is irrelevant, that likelihoods alone matter (thereby begging the question). Sufficiency (weak likelihood) refers to a single experiment, SLP refers to two (with different sample spaces). Now T-B is a sufficient statistic within experiment E-B—Birnbaum’s “mathematical mixture”–that is why the inference implication from E-B differs from the inference implications from E1 as well as from E2. That blocks the SLP.

You’re right that the WCP tells us to use the known experiment (E2) in reaching the parametric inference from y*. That will differ from the inference within E-B (except of course where the unconditional happens to equal the conditional and “applying” WCP doesn’t change anything).