Our current topic, the strong likelihood principle (SLP), was recently mentioned by blogger Christian Robert (nice diagram). So ,since it’s Saturday night, and given the new law just passed in the state of Washington*, I’m going to reblog a post from Jan. 8, 2012, along with a new UPDATE (following a **video** we include as an experiment). The new material will be in red (slight differences in notation are explicated within links).

**(A)** “It is not uncommon to see statistics texts argue that in frequentist theory one is faced with the following dilemma: either to deny the appropriateness of conditioning on the precision of the tool chosen by the toss of a coin[i], or else to embrace the strong likelihood principle which entails that frequentist sampling distributions are irrelevant to inference once the data are obtained. This is a false dilemma … The ‘dilemma’ argument is therefore an illusion”. (Cox and Mayo 2010, p. 298)

The “illusion” stems from the sleight of hand I have been explaining in the Birnbaum argument—it starts with Birnbaumization.

** (B)** A reader wrote in that he awaits approval of my argument by either Sir David Cox or Christian Robert ; I cannot vouchsafe for Robert, unless he has revised his first impression in his October 6, 2011 blog (as I hope he has). For in that blog post Robert says

“If Mayo’s frequentist stance leads her to take the sampling distribution into account at all times, this is fine within her framework. But I do not see how this argument contributes to invalidate Birnbaum’s proof.”

[See UPDATE BELOW]

I am taking sampling distributions into account because Birnbaum’s “proof” is supposed to be relevant for a sampling theorist! If it is not relevant for a sampling theorist (my error statistician) then there is no “breakthrough” and there is no special interest in the result (given that Bayesians already have the LP, as do the likelihoodists).[ii] It is only because principles that are already part of the sampling theorist’s steady diet are alleged to entail the LP (in Birbaum’s argument) that Savage declared that, once made aware of Birnbaum’s result, he doubted people would stop at the LP appetizer, but would instead go all the way to consuming the full Bayesian omelet! (For Savage reference, see my new **PAPER **or “Breaking through the Breakthrough” posts **Dec 6 & Dec 7, 2011;]**

Robert’s remark is just [an example] that reveals a deep misunderstanding of sampling theory. (Although I prefer error statistics, I will use sampling theory for this post.) Even if Robert has corrected himself, as I very much hope he has, other readers may be under the same illusion. I had paused to clarify this point in my October 20, 2011 post.

**(C)** Likelihood Principle Violations

My Oct. 20 post was devoted to arguing that it is impossible to understand the whole issue without understanding how it is that frequentist sampling theory violates the LP. That it does so is not a point of controversy, so far as I know:

As Lindley (1971) stresses:

“.. sampling distributions, significance levels, power, all depend on something more [than the likelihood function]–something that is irrelevant in Bayesian inference–namely the sample space” (Lindley p. 436).

He means, once the data are known the sample space is irrelevant for appraisal. (The LP already assumes the statistical model underlying the likelihood is given or not in question.) Or, more recently, take Kadane 2011:

“Significance testing violates the Likelihood Principle, which states that, having observed the data, inference must rely only on what happened, and not on what might have happened but did not. The Bayesian methods explored in this book obey this principle” (Kadane, 439).

“Like their testing cousins, confidence intervals and sets violate the likelihood principle” (ibid. 441).

So it’s hard to see how Robert can really mean to say that sampling distribution considerations are irrelevant, when they are the heart and centerpiece of the Birnbaum argument. Far from being irrelevant, Birnbaum’s result is all about sampling distributions (even if addressed by someone who is not herself a sampling theorist!)

**(D)** Now to consider what Robert says in his OCT. 2011 post, with my remarks following:

**Robert**: “The core of Birnbaum’s proof is relatively simple: given two experiments *E’* and *E”* about the same parameter *θ* with different sampling distributions *f¹* and *f²*, such that there exists a pair of outcomes *(y’, y”) *from those experiments with proportional likelihoods, one considers the mixture experiment where *E’ *and *E”* are each chosen with probability ½.

Then it is possible to build a sufficient statistic *T* that is equal to the data *(j,z)*, except when *j=2* and *z=y”*, in which case *T(j,z)=(1,y’)*.”

**Mayo: ** Put more informally, if y’ and y” is any LP violation pair (i.e., the two would yield different inferences/assessments of the evidence due to the difference in sampling distributions), then it is possible to “build” a statistic T for interpreting them such that y” (from E”) is always reported as y’ from E’.[iii] I called this Birnbaum’s statistic T-BB.[iv] It is possible, in short, to Birnbaumize the result (E’, y’) whenever there is an experiment E”, not performed, that could have resulted in y”, with a proportional likelihood (with the same parameter under investigation and the model assumptions granted).

**Robert**: “This statistic [T-BB] is sufficient”.

**Mayo**: Yes, T-BB is sufficient for an experiment that will report its inference based on the rules of Birnbaumization: The sampling distribution of T-BB is to be the convex combination of the sampling distributions of E’ and E” whenever confronted with an outcome that has an LP violation pair (for more details see posts from Dec. 6, 7, and references within).[v] Cox rightly questions even this first step, but I’m prepared to play along since the “proof” breaks down anyway.[vi]

It should be emphasized that in carrying out this Birnbaumization, one is not free from considering the accompanying sampling distribution (corresponding to the statistic T-BB just “built”): the Birnbaumization move *depends* on having a single sampling distribution (otherwise sufficiency would not apply)[vii].

While Robert switches our Infr_{E}(z) notation (Cox and Mayo 2010) to Birnbaum’s Ev(E, z), I will go ahead and leave it as Ev. Infr_{E} was deliberately designed to be clearer, easier to read, and less likely to hide the very equivocation that is overlooked in this example.

Robert observes:

Whether j = 1 or j = 2, Ev(E-BB, (j, z)) = Ev(E-BB, T(j,z))

This corresponds to my premise (1):

(1) Infr_{E-BB}(E’, y’) = Infr_{E-BB}(E”, y”)

In the relevant case, y’ and y” are LP violation pairs, since only those pose the threat to obeying the LP. So we can focus just on those in this note. In Mayo 2010 I used the * to indicate an outcome is part of an LP violation pair.

**(E) ** Next Robert gives premise (2), though he switches the order: this corresponds to two applications of weak conditionality (WCP) [combining my 2a and 2b]:

(2) Whether j = 1 or j = 2, Ev(E-BB, (j, z)) = Ev(E^{j}, z)

The key issue concerns a quote from me (with Robert’s substitutions of Ev for Infr). Note, by the way, that Robert is alluding to my chapter in Mayo 2010, not the short version that I posted on this blog, Dec 6, 7

**Robert**: “Now, Mayo argues this is wrong because [it asserts that]:

‘[the mixed experiment E-BB] is appropriately identified with an inference from outcome y

^{j}based on the sampling distribution of E^{j}, which is clearly false'”.(p.310)

*(continuing Robert’s quote of me):*

“ ‘The sampling distribution to arrive at Ev(E-BB, (j, y

^{j})) would be the convex combination averaged over the two ways that y^{j}could have occurred. This differs from the sampling distributions of both Ev(E’, y’) and Ev(E”, y”)’. This sounds to me like a direct rejection of the conditionality principle, so I do not understand the point.”(Robert, Oct. 6, 2011 post, p.310)

**Mayo**: I am not at all rejecting the WCP. The passage Robert quotes merely states the obvious; namely, the assertion: the inference computed using the sampling distribution of E-BB is identical to the inference using the sampling distribution of E’ by itself (or E” by itself)—is false! If we are playing Birnbaumization, then the appropriate sampling distribution is the convex combination. (In the section from which Robert is quoting, a reader will note, I have put Birnbaum’s argument in valid form.)

But wait a minute, just a few lines later it turns out Robert does *not* deny my claim! He repeats it as obviously true, …..but suddenly it has become irrelevant.

**Robert**: “Indeed, and rather obviously, the sampling distribution of the evidence *Ev(E ^{*},z^{*})* will differ depending on the experiment. But this is not what is stated by the likelihood principle, which is that the inference itself should be the same for

*y’*and

*y”*not the [sampling?] distribution of this inference” (Robert, p. 310).

**Mayo**: Huh? This makes no sense. There is no inference apart from the sampling distribution for a sampling theorist. One cannot assume there is somehow an inference apart from the sampling distribution. Sampling theory has simply not been understood. Robert’s own rendition of the argument [my Premise 1], depends on a merged sampling distribution, thanks to Birnbaumization; it certainly does not ignore sampling distributions. So I’m afraid I don’t know what Robert is talking about here. (This same point arose in the discussion by Aris Spanos when Robert’s post first appeared.)

Robert [seems to?] go on to deny there are any LP counterexamples, because they all turn on pointing up the difference in sampling distributions! All I can do at this point is go back to where I bagan: listen to Birnbaum, Kadane, Lindley, Savage and everyone else who has discussed the (uncontroversial) fact that error statistics violates the LP! No one would be claiming sampling theory was incoherent were it not that it is prepared to reach different inferences from y’, y” despite their having proportional likelihoods (i.e., despite the conditions for the LP being met), and it does so solely because of a difference in sampling distributions.[viii] [ix]

Kadane, J. (2011), *Principles of Uncertainty*, CRC Press.

Mayo: 10/20/2011 Post: blogging-likelihood-principle-2

* The title is a distant analogue to that song “Don’t Bogart that chalk my friend, pass it on to me”.

**UPDATE**: DECEMBER 8, 2012: Christian Robert writes, on his Nov. 30 blog, that “[a]fter reading again Birnbaum’s proof, while sitting down in a quiet room….I do not see any reason to doubt it.” His confusion, he says,

**“**was caused by mixing

*sufficiency*in the sense of Birnbaum’s mixed experiment with

*sufficiency*in the sense” of his Bayesian model selection method. The point, I take it, is that Birnbaum’s proof doesn’t go through for this method and thus it needn’t obey the SLP.

**But it doesn’t go through for sufficiency in the sense of sampling theory either!**(at least not together with the additional premise needed to detach the SLP.) In fact, I argue, it would only hold for a sense of sufficiency that assumes “SLP pairs” are evidentially equivalent for informative inference (for definitions see several previous discussions). That is just to make its appeal in a “proof” for the SLP entirely circular, as it is.

I haven’t yet seen the book to which Robert is alluding (*Paradoxes in Scientific Inference*)—tried to Kindle it, didn’t work–, but I’ve no reason to doubt his claim that the author has really mixed things up[i]. In Robert’s Nov. 23, 2012 post (reviewing *Paradoxes*) he hints that he came up with “another interpretation of Mayo’s argument that could prove her right!”. He directs the reader to the later, Nov. 30, post which, if I’m understanding it, appears to go back to his initial belief in Birnbaum(?)

“The chapter on statistical controversies actually focus on the opposition between frequentist, likelihood, and Bayesian paradigms. The author seems to have studied Mayo and Spanos’to great lengths. (As I did, as I did!) He spends around twenty pages in Chapter 3 on this opposition and on the conditionality, sufficiency, and likelihood principles that were reunited by Birnbaum and recently deconstructed by Mayo. In my opinion, Chang makes a mess of describing the issues at stake in this debate and leaves the reader more bemused at the end than at the beginning of the chapter. For instance, the conditionality principle is confused with theError and Inferencep-value being computed conditional on the null (hypothesis) model (p.110).” Chang, M. (2012),Paradoxes in Scientific Inference, Chapman and Hall.

**y**in a sampling theory experiment E by means of the abbreviation Infr

_{E}(

**y**), we assume, for simplicity, that packed into E would be the probability model, parameters, and the sampling distribution corresponding to the inference in question. We prefer it because it underscores the need to consider the associated methodology and context. Birnbaum construes Ev(E,

**x**) as “the evidence about the parameter arising from experiment E and result

**x**” and allows it to range over the inference, conclusion, or report, including p-values, confidence intervals and levels, posteriors. So our notation accomplishes the same, but with (hopefully) less chance of equivocations.

Great post.

I wonder if the title will be lost on younger readers.

Was it “Little Feat” or “Fraternity of Man”?

Normaldeviate: Thanks so much! I really appreciate it.* Yes, I was gong to say that younger people might not know about that song (“don’t bogart that ‘chalk’ my friend”), so I included the video. Can you see the video? It’s only the second or third time we’ve tried to implant one in a blog!

Of course, now with the new laws, everything old is new again—for some people, anyway.

*And I’m very glad to see a reference to the LP in your current post, which I’ll read more carefully tomorrow (5 hrs later in London).

Indeed, I was a little bit lost with the title – never heard about or seen the song (“don’t bogart that ‘chalk’ my friend”) until I saw this post. Now I don’t feel as embarrassed.

Nicole: I’m just glad you were crystal clear on the details on Birnbaum’s arguments and able to grasp some of Robert’s ponderings, this way N that way as to whether or not we should accept BB’s moves (even though Birnbaum abandoned his own arguments).

Being unclear about the lyrics and title of the song and what it might or might not have meant in the 60s (when Birnbaum was writing) as regards sharing or hogging chalk or stealing SLP pairs and taking away their names and leaving them with only a likelihood, when in fact we should pass on the known experimental chalk, and condition on the electric guitar that is known to have been the one performed in the band, quite aside from whatever bets were first taken as to who would get the first piece of chalk, or whether a precise or imprecise musical instrument would be determined by an irrelevant randomization, flip of the guitar pic or whatnot or that the Country Fish might have signed with a different manager far less reliable than their known manager. Once it is known the joint they’re actually playing in, with what instruments, and what they’re being paid, other joints they might have inbibed in, be they Mary Janes, hookahs or hashtags is really quite secondary. Agree?

Oh and watch the video!

Yes, I agree with you! Thank you so much for explaining the analogy with the song – I appreciate it. Yes, I watched the video. Never have seen anything quite like it until now.

Just noticed the video.

It played no problem

NormalDeviate: I always thought it was Country Joe and the Fish, but you might be right, never heard of those other guys before.

How can anyone miss the importance of the sampling distribution in the postulate of sufficiency that plays a major role in frequentist inference? It is very clear in the definition [in Cox and Mayo (2010)]:

Sufficiency Principle (in sampling theory): If a random sample X, in experiment E arises from f(x; q), and the assumptions of the model are valid, then all the information about q contained in the data may be obtained from considering its minimal sufficient statistic T and its sampling distribution f(t;q) (in experiment E).

There is one experimental model with one sampling distribution corresponding to the model; it doesn’t matter if E is a mixture.

Anon: Yes, it’s hard to understand, except as denying the frequentist’s notion of sufficiency.