strong likelihood principle

Blogging (flogging?) the SLP: Response to Reply- Xi’an Robert

peeking through cover EGEK

Breaking through “the breakthrough”

Christian Robert’s reply grows out of my last blogpost. On Xi’an’s Og :

A quick reply from my own Elba, in the Dolomiti: your arguments (about the sad consequences of the SLP) are not convincing wrt the derivation of SLP=WCP+SP. If I built a procedure that reports (E1,x*) whenever I observe (E1,x*) or (E2,y*), this obeys the sufficiency principle; doesn’t it? (Sorry to miss your talk!)

Mayo’s response to Xi’an on the “sad consequences of the SLP.”[i]

This is a useful reply (so to me it’s actually not ‘flogging’ the SLP[ii]), and, in fact, I think Xi’an will now see why my arguments are convincing! Let’s use Xi’an’s procedure to make a parametric inference about q. Getting the report x* from Xi’an’s procedure, we know it could have come from E1 or E2. In that case, the WCP forbids us from using either individual experiment to compute the inference implication. We use the sampling distribution of TB.

Birnbaum’s statistic TB is a technically sufficient statistic for Birnbaum’s experiment EB  (the conditional distribution of Z given TB is independent of q). The question of whether this is the relevant or legitimate way to compute the inference when it is given that y* came from E2 is the big question. The WCP says it is not. Now you are free to use Xi’an’s procedure (free to Birnbaumize) but that does not yield the SLP. Nor did Birnbaum think it did. That’s why he goes on to say: “Never mind. Don’t use Xi’an’s procedure.  Compute the inference using E2  just as the WCP tells you to. You know it came from E. Isn’t that what David Cox taught us in 1958?”

Fine. But still no SLP!  Note it’s not that SP and WCP conflict, it’s WCP and Birnbaumization that conflict. The application of a principle will always be relative to the associated model used to frame the question.[iii]

These points are all spelled out clearly in my paper: [I can’t get double subscripts here. Eis the same as E-B][iv]

Given y*, the WCP says do not Birnbaumize. One is free to do so, but not to simultaneously claim to hold the WCP in relation to the given y*, on pain of logical contradiction. If one does choose to Birnbaumize, and to construct TB, admittedly, the known outcome y* yields the same value of TB as would x*. Using the sample space of EB yields: (B): InfrE-B[x*] = InfrE-B[y*]. This is based on the convex combination of the two experiments, and differs from both InfrE1[x*] and InfrE2[y*]. So again, any SLP violation remains. Granted, if only the value of TB is given, using InfrE-B may be appropriate. For then we are given only the disjunction: Either (E1, x*) or (E2, y*). In that case one is barred from using the implication from either individual Ei. A holder of WCP might put it this way: once (E,z) is given, whether E arose from a q-irrelevant mixture, or was fixed all along, should not matter to the inference; but whether a result was Birnbaumized or not should, and does, matter.

There is no logical contradiction in holding that if data are analyzed one way (using the convex combination in EB), a given answer results, and if analyzed another way (via WCP) one gets quite a different result. One may consistently apply both the Eand the WCP directives to the same result, in the same experimental model, only in cases where WCP makes no difference. To claim the WCP never makes a difference, however, would entail that there can be no SLP violations, which would make the argument circular. Another possibility, would be to hold, as Birnbaum ultimately did, that the SLP is “clearly plausible” (Birnbaum 1968, 301) only in “the severely restricted case of a parameter space of just two points” where these are predesignated (Birnbaum 1969, 128). But SLP violations remain.

Note: The final draft of my paper uses equations that do not transfer directly to this blog. Hence, these sections are from a draft of my paper.

[i] Although I didn’t call them “sad,” I think it would be too bad to accept the SLP’s consequences. Listen to Birnbaum:

The likelihood principle is incompatible with the main body of modern statistical theory and practice, notably the Neyman-Pearson theory of hypothesis testing and of confidence intervals, and incompatible in general even with such well-known concepts as standard error of an estimate and significance level. (Birnbaum 1968, 300)

That is why Savage called it “a breakthrough” result. In the end, however, Birnbaum could not give up on control of error probabilities. He held the SLP only for the trivial case of predesignated simple hypotheses. (Or, perhaps he spied the gap in his argument? I suspect, from his writings, that he realized his argument went through only for such cases that do not violate the SLP.)

[ii] Readers may feel differently.

[iii] Excerpt from a draft of my paper:
Model checking. An essential part of the statements of the principles SP, WCP, and SLP is that the validity of the model is granted as adequately representing the experimental conditions at hand (Birnbaum 1962, 491). Thus, accounts that adhere to the SLP are not thereby prevented from analyzing features of the data such as residuals, which are relevant to questions of checking the statistical model itself. There is some ambiguity on this point in Casella and R. Berger (2002):

Most model checking is, necessarily, based on statistics other than a sufficient statistic. For example, it is common practice to examine residuals from a model.  . . Such a practice immediately violates the Sufficiency Principle, since the residuals are not based on sufficient statistics. (Of course such a practice directly violates the [strong] LP also.) (Casella and R. Berger 2002, 295-6)

They warn that before considering the SLP and WCP, “we must be comfortable with the model” (296). It seems to us more accurate to regard the principles as inapplicable, rather than violated, when the adequacy of the relevant model is lacking.

Birnbaum, A.1968. “Likelihood.” In International Encyclopedia of the Social Sciences, 9:299–301. New York: Macmillan and the Free Press.

———. 1969. “Concepts of Statistical Evidence.” In Philosophy, Science, and Method: Essays in Honor of Ernest Nagel, edited by S. Morgenbesser, P. Suppes, and M. G. White, 112–143. New York: St. Martin’s Press.

Casella, G., and R. L. Berger. 2002. Statistical Inference. 2nd ed. Belmont, CA: Duxbury Press.

Mayo 2013, (

Categories: Birnbaum Brakes, Statistics, strong likelihood principle | 9 Comments

New Version: On the Birnbaum argument for the SLP: Slides for my JSM talk

Picture 216 1mayo In my latest formulation of the controversial Birnbaum argument for the strong likelihood principle (SLP), I introduce a new symbol \Rightarrow  to represent a function from a given experiment-outcome pair, (E,z) to a generic inference implication.  This should clarify my argument (see my new paper).

(E,z) \Rightarrow InfrE(z) is to be read “the inference implication from outcome z in experiment E” (according to whatever inference type/school is being discussed).

A draft of my slides for the Joint Statistical Meetings JSM in Montreal next week are right after the abstract. Comments are very welcome.

Interested readers may search this blog for quite a lot of discussion of the SLP (e.g., here and here) including links to the central papers, “U-Phils” by others (e.g., here, here, and here), and amusing notes (e.g., Don’t Birnbaumize that experiment my friend, and Midnight with Birnbaum).

On the Birnbaum Argument for the Strong Likelihood Principle


An essential component of inference based on familiar frequentist notions p-values, significance and confidence levels, is the relevant sampling distribution (hence the term sampling theory). This feature results in violations of a principle known as the strong likelihood principle (SLP), the focus of this paper. In particular, if outcomes x* and y* from experiments E1 and E2 (both with unknown parameter θ), have different probability models f1, f2, then even though f1(x*; θ) = cf2(y*; θ) for all θ, outcomes x* and y* may have different implications for an inference about θ. Although such violations stem from considering outcomes other than the one observed, we argue, this does not require us to consider experiments other than the one performed to produce the data. David Cox (1958) proposes the Weak Conditionality Principle (WCP) to justify restricting the space of relevant repetitions. The WCP says that once it is known which Ei produced the measurement, the assessment should be in terms of the properties of the particular Ei.      

The surprising upshot of Allan Birnbaum’s (1962) argument is that the SLP appears to follow from applying the WCP in the case of mixtures, and so uncontroversial a principle as sufficiency (SP). But this would preclude the use of sampling distributions. The goal of this article is to provide a new clarification and critique of Birnbaum’s argument. Although his argument purports that [(WCP and SP) entails SLP], we show how data may violate the SLP while holding both the WCP and SP. Such cases directly refute [WCP entails SLP].

Comments, questions, errors are welcome.

Full paper can be found here:

Categories: Error Statistics, Statistics, strong likelihood principle | 20 Comments

Mark Chang (now) gets it right about circularity

metablog old fashion typewriterMark Chang wrote a comment this evening, but it is buried back on my Nov. 31 post in relation to the current U-Phil. Given all he has written on my attempt to “break through the breakthrough”, I thought to bring it up to the top. Chang ends off his comment with the sagacious, and entirely correct claim that so many people have missed:

“What Birnbaum actually did was use the SLP to prove the SLP – as simple as that!” (Mark Chang)

It is just too bad that readers of his (2013) book will not have been told this*!  Mark: Can you issue a correction?  I definitely think you should!  If only you’d written to me, I could have pointed this out pre-pub.

That Birnbaum’s argument assumes what it claims to prove is just what I have been arguing all along. It is called a begging-the-question fallacy: An argument that boils down to:

A/therefore A

Such an argument is logically valid, and that is why formal validity does not mean much for getting conclusions accepted. Why? Well, even though such circular arguments are usually dressed up so that the premises do not so obviously repeat the conclusion, they are similarly fallacious: the truth of the premises already assumes the truth of the conclusion. If we are allowed to argue that way, you can argue anything you like! To not-A as well. That is not what the Great “Breakthrough” was supposed to be doing.

Chang’s comment (which is the same one he posted on Xi’an’s og here) also includes his other points, but fortunately, Jean Miller has recently gone through those in depth. In neither of my (generous) construals of Birnbaum do I claim his premises are inconsistent, by the way.

*But instead his readers are led to believe my criticism is flawed because of something about sufficiency having to do with a FAMILY of distributions (his caps on “family”, p. 138). This all came up as well in Xi”an’s og.

Chang, M. (2013) Paradoxes in Scientific Inference.


Categories: strong likelihood principle, U-Phil | 2 Comments

U-Phil: Ton o’ Bricks

ton_of_bricksby Deborah Mayo

Birnbaum’s argument for the SLP involves some equivocations that are at once subtle and blatant. The subtlety makes it hard to translate into symbolic logic (I only partially translated it). Philosophers should have a field day with this, and I should be hearing more reports that it has suddenly hit them between the eyes like a ton of bricks, to use a mixture metaphor. Here are the key bricks. References can be found in here, background to the U-Phil here..

Famous (mixture) weighing machine example and the WLP 

The main principle of evidence on which Birnbaum’s argument rests is the weak conditionality principle (WCP).  This principle, Birnbaum notes, follows not from mathematics alone but from intuitively plausible views of “evidential meaning.” To understand the interpretation of the WCP that gives it its plausible ring, we consider its development in “what is now usually called the ‘weighing machine example,’ which draws attention to the need for conditioning, at least in certain types of problems” (Reid 1992).

The basis for the WCP 

Example 3. Two measuring instruments of different precisions. We flip a fair coin to decide which of two instruments, E’ or E”, to use in observing a normally distributed random sample X to make inferences about mean q. Ehas a known variance of 10−4, while that of E” is known to be 104. The experiment is a mixture: E-mix. The fair coin or other randomizer may be characterized as observing an indicator statistic J, taking values 1 or 2 with probabilities .5, independent of the process under investigation. The full data indicates first the result of the coin toss, and then the measurement: (Ej, xj).[i]

The sample space of E-mix with components Ej, j = 1, 2, consists of the union of

{(j, x’): j = 0, possible values of X’} and {(j, x”): j = 1, possible values of X”}.

In testing a null hypothesis such as q = 0, the same x measurement would correspond to a much smaller p-value were it to have come from E′ than if it had come from E”: denote them as p′(x) and p′′(x), respectively. However, the overall significance level of the mixture, the convex combination of the p-value: [p′(x) + p′′(x)]/2, would give a misleading report of the precision or severity of the actual experimental measurement (See Cox and Mayo 2010, 296).

Suppose that we know we have observed a measurement from E” with its much larger variance:

The unconditional test says that we can assign this a higher level of significance than we ordinarily do, because if we were to repeat the experiment, we might sample some quite different distribution. But this fact seems irrelevant to the interpretation of an observation which we know came from a distribution [with the larger variance] (Cox 1958, 361).

In effect, an individual unlucky enough to use the imprecise tool gains a more informative assessment because he might have been lucky enough to use the more precise tool! (Birnbaum 1962, 491; Cox and Mayo 2010, 296). Once it is known whether E′ or E′′ has produced x, the p-value or other inferential assessment should be made conditional on the experiment actually run.

Weak Conditionality Principle (WCP): If a mixture experiment is performed, with components E’, E” determined by a randomizer (independent of the parameter of interest), then once (E’, x’) is known, inference should be based on E’ and its sampling distribution, not on the sampling distribution of the convex combination of E’ and E”.

Understanding the WCP

The WCP includes a prescription and a proscription for the proper evidential interpretation of x’, once it is known to have come from E’:

The evidential meaning of any outcome (E’, x’) of any experiment E having a mixture structure is the same as: the evidential meaning of the corresponding outcome x’ of the corresponding component experiment E’, ignoring otherwise the over-all structure of the original experiment E (Birnbaum 1962, 489 Eh and xh replaced with E’ and x’ for consistency).

While the WCP seems obvious enough, it is actually rife with equivocal potential. To avoid this, we spell out its three assertions.

First, it applies once we know which component of the mixture has been observed, and what the outcome was (Ej xj). (Birnbaum considers mixtures with just two components).

Second, there is the prescription about evidential equivalence. Once it is known that Ej has generated the data, given that our inference is about a parameter of Ej, inferences are appropriately drawn in terms of the distribution in Ej —the experiment known to have been performed.

Third, there is the proscription. In the case of informative inferences about the parameter of Ej our inference should not be influenced by whether the decision to perform Ej was determined by a coin flip or fixed all along. Misleading informative inferences might result from averaging over the convex combination of Ej and an experiment known not to have given rise to the data. The latter may be called the unconditional (sampling) distribution. ….


One crucial equivocation:

 Casella and R. Berger (2002) write:

The [weak] Conditionality principle simply says that if one of two experiments is randomly chosen and the chosen experiment is done, yielding data x, the information about q depends only on the experiment performed. . . . The fact that this experiment was performed, rather than some other, has not increased, decreased, or changed knowledge of q. (p. 293, emphasis added)

I have emphasized the last line in order to underscore a possible equivocation. Casella and Berger’s intended meaning is the correct claim:

(i) Given that it is known that measurement x’ is observed as a result of using tool E’, then it does not matter (and it need not be reported) whether or not E’ was chosen by a random toss (that might have resulted in using tool E”) or had been fixed all along.

Of course we do not know what measurement would have resulted had the unperformed measuring tool been used.

Compare (i) to a false and unintended reading:

(ii) If some measurement x is observed, then it does not matter (and it need not be reported) whether it came from a precise tool E’ or imprecise tool E”.

The idea of detaching x, and reporting that “x came from somewhere I know not where,” will not do. For one thing, we need to know the experiment in order to compute the sampling inference. For another, E’ and E” may be like our weighing procedures with very different precisions. It is analogous to being given the likelihood of the result in Example 1,(here) withholding whether it came from a negative binomial or a binomial.

Claim (i), by contrast, may well be warranted, not on purely mathematical grounds, but as the most appropriate way to report the precision of the result attained, as when the WCP applies. The essential difference in claim (i) is that it is known that (E, x’), enabling its inferential import to be determined.

The linguistic similarity of (i) and (ii) may explain the equivocation that vitiates the Birnbaum argument.

Now go back and skim 3 short pages of notes here, pp 11-14, and it should hit you like a ton of bricks!  If so, reward yourself with a double Elba Grease, else try again. Report your results in the comments.

Categories: Birnbaum Brakes, Statistics, strong likelihood principle, U-Phil | 7 Comments

U-Phil: J. A. Miller: Blogging the SLP

Jean Miller

Jean Miller

Jean A. Miller, PhD
Department of Philosophy
Virginia Tech


Mayo in her “rejected” post (12/27/12) briefly points out how Mark Chang, in his book Paradoxes of Scientific Inference (2012, pp. 137-139), took pieces from the two distinct variations she gives of Birnbaum’s arguments, either of which shows the unsoundness of Birnbaum’s purported proof, and illegitimately combines them. He then mistakenly maintains that it is Mayo’s conclusions that are “faulty” rather than Birnbaum’s argument. In this note, I just want to fill in some of the missing pieces of what is going on here, so that others will not be misled. I put together some screen shots so you can read exactly what he wrote pp. 137-139. (See also Mayo’s note to Chang on Xi’an’s blog here.) Continue reading

Categories: Statistics, strong likelihood principle, U-Phil | 5 Comments

U-Phil: S. Fletcher & N.Jinn

Samuel Fletcher

“Model Verification and the Likelihood Principle” by Samuel C. Fletcher
Department of Logic & Philosophy of Science (PhD Student)
University of California, Irvine

I’d like to sketch an idea concerning the applicability of the Likelihood Principle (LP) to non-trivial statistical problems.  What I mean by “non-trivial statistical problems” are those involving substantive modeling assumptions, where there could be any doubt that the probability model faithfully represents the mechanism generating the data.  (Understanding exactly how scientific models represent phenomena is subtle and important, but it will not be my focus here.  For more, see In such cases, it is crucial for the modeler to verify, inasmuch as it is possible, the sufficient faithfulness of those assumptions.

But the techniques used to verify these statistical assumptions are themselves statistical. One can then ask: do techniques of model verification fall under the purview of the LP?  That is: are such techniques a part of the inferential procedure constrained by the LP?  I will argue the following:

(1) If they are—what I’ll call the inferential view of model verification—then there will be in general no inferential procedures that satisfy the LP.

(2) If they are not—what I’ll call the non-inferential view—then there are aspects of any evidential evaluation that inferential techniques bound by the LP do not capture. Continue reading

Categories: Statistics, strong likelihood principle, U-Phil | 17 Comments

Midnight With Birnbaum-reblog

 Reblogging Dec. 31, 2011:

You know how in that recent movie, “Midnight in Paris,” the main character (I forget who plays it, I saw it on a plane) is a writer finishing a novel, and he steps into a cab that mysteriously picks him up at midnight and transports him back in time where he gets to run his work by such famous authors as Hemingway and Virginia Wolf?  He is impressed when his work earns their approval and he comes back each night in the same mysterious cab…Well, imagine an error statistical philosopher is picked up in a mysterious taxi at midnight (New Year’s Eve 2011 2012) and is taken back fifty years and, lo and behold, finds herself in the company of Allan Birnbaum.[i]

ERROR STATISTICIAN: It’s wonderful to meet you Professor Birnbaum; I’ve always been extremely impressed with the important impact your work has had on philosophical foundations of statistics.  I happen to be writing on your famous argument about the likelihood principle (LP).  (whispers: I can’t believe this!)

BIRNBAUM: Ultimately you know I rejected the LP as failing to control the error probabilities needed for my Confidence concept.

ERROR STATISTICIAN: Yes, but I actually don’t think your argument shows that the LP follows from such frequentist concepts as sufficiency S and the weak conditionality principle WLP.[ii]  Sorry,…I know it’s famous… Continue reading

Categories: Birnbaum Brakes, strong likelihood principle | Tags: , , , | 2 Comments

Blog at