Birnbaum Brakes

Blogging (flogging?) the SLP: Response to Reply- Xi’an Robert

peeking through cover EGEK

Breaking through “the breakthrough”

Christian Robert’s reply grows out of my last blogpost. On Xi’an’s Og :

A quick reply from my own Elba, in the Dolomiti: your arguments (about the sad consequences of the SLP) are not convincing wrt the derivation of SLP=WCP+SP. If I built a procedure that reports (E1,x*) whenever I observe (E1,x*) or (E2,y*), this obeys the sufficiency principle; doesn’t it? (Sorry to miss your talk!)

Mayo’s response to Xi’an on the “sad consequences of the SLP.”[i]

This is a useful reply (so to me it’s actually not ‘flogging’ the SLP[ii]), and, in fact, I think Xi’an will now see why my arguments are convincing! Let’s use Xi’an’s procedure to make a parametric inference about q. Getting the report x* from Xi’an’s procedure, we know it could have come from E1 or E2. In that case, the WCP forbids us from using either individual experiment to compute the inference implication. We use the sampling distribution of TB.

Birnbaum’s statistic TB is a technically sufficient statistic for Birnbaum’s experiment EB  (the conditional distribution of Z given TB is independent of q). The question of whether this is the relevant or legitimate way to compute the inference when it is given that y* came from E2 is the big question. The WCP says it is not. Now you are free to use Xi’an’s procedure (free to Birnbaumize) but that does not yield the SLP. Nor did Birnbaum think it did. That’s why he goes on to say: “Never mind. Don’t use Xi’an’s procedure.  Compute the inference using E2  just as the WCP tells you to. You know it came from E. Isn’t that what David Cox taught us in 1958?”

Fine. But still no SLP!  Note it’s not that SP and WCP conflict, it’s WCP and Birnbaumization that conflict. The application of a principle will always be relative to the associated model used to frame the question.[iii]

These points are all spelled out clearly in my paper: [I can’t get double subscripts here. Eis the same as E-B][iv]

Given y*, the WCP says do not Birnbaumize. One is free to do so, but not to simultaneously claim to hold the WCP in relation to the given y*, on pain of logical contradiction. If one does choose to Birnbaumize, and to construct TB, admittedly, the known outcome y* yields the same value of TB as would x*. Using the sample space of EB yields: (B): InfrE-B[x*] = InfrE-B[y*]. This is based on the convex combination of the two experiments, and differs from both InfrE1[x*] and InfrE2[y*]. So again, any SLP violation remains. Granted, if only the value of TB is given, using InfrE-B may be appropriate. For then we are given only the disjunction: Either (E1, x*) or (E2, y*). In that case one is barred from using the implication from either individual Ei. A holder of WCP might put it this way: once (E,z) is given, whether E arose from a q-irrelevant mixture, or was fixed all along, should not matter to the inference; but whether a result was Birnbaumized or not should, and does, matter.

There is no logical contradiction in holding that if data are analyzed one way (using the convex combination in EB), a given answer results, and if analyzed another way (via WCP) one gets quite a different result. One may consistently apply both the Eand the WCP directives to the same result, in the same experimental model, only in cases where WCP makes no difference. To claim the WCP never makes a difference, however, would entail that there can be no SLP violations, which would make the argument circular. Another possibility, would be to hold, as Birnbaum ultimately did, that the SLP is “clearly plausible” (Birnbaum 1968, 301) only in “the severely restricted case of a parameter space of just two points” where these are predesignated (Birnbaum 1969, 128). But SLP violations remain.

Note: The final draft of my paper uses equations that do not transfer directly to this blog. Hence, these sections are from a draft of my paper.

[i] Although I didn’t call them “sad,” I think it would be too bad to accept the SLP’s consequences. Listen to Birnbaum:

The likelihood principle is incompatible with the main body of modern statistical theory and practice, notably the Neyman-Pearson theory of hypothesis testing and of confidence intervals, and incompatible in general even with such well-known concepts as standard error of an estimate and significance level. (Birnbaum 1968, 300)

That is why Savage called it “a breakthrough” result. In the end, however, Birnbaum could not give up on control of error probabilities. He held the SLP only for the trivial case of predesignated simple hypotheses. (Or, perhaps he spied the gap in his argument? I suspect, from his writings, that he realized his argument went through only for such cases that do not violate the SLP.)

[ii] Readers may feel differently.

[iii] Excerpt from a draft of my paper:
Model checking. An essential part of the statements of the principles SP, WCP, and SLP is that the validity of the model is granted as adequately representing the experimental conditions at hand (Birnbaum 1962, 491). Thus, accounts that adhere to the SLP are not thereby prevented from analyzing features of the data such as residuals, which are relevant to questions of checking the statistical model itself. There is some ambiguity on this point in Casella and R. Berger (2002):

Most model checking is, necessarily, based on statistics other than a sufficient statistic. For example, it is common practice to examine residuals from a model.  . . Such a practice immediately violates the Sufficiency Principle, since the residuals are not based on sufficient statistics. (Of course such a practice directly violates the [strong] LP also.) (Casella and R. Berger 2002, 295-6)

They warn that before considering the SLP and WCP, “we must be comfortable with the model” (296). It seems to us more accurate to regard the principles as inapplicable, rather than violated, when the adequacy of the relevant model is lacking.

Birnbaum, A.1968. “Likelihood.” In International Encyclopedia of the Social Sciences, 9:299–301. New York: Macmillan and the Free Press.

———. 1969. “Concepts of Statistical Evidence.” In Philosophy, Science, and Method: Essays in Honor of Ernest Nagel, edited by S. Morgenbesser, P. Suppes, and M. G. White, 112–143. New York: St. Martin’s Press.

Casella, G., and R. L. Berger. 2002. Statistical Inference. 2nd ed. Belmont, CA: Duxbury Press.

Mayo 2013, (

Categories: Birnbaum Brakes, Statistics, strong likelihood principle

U-Phil: Mayo’s response to Hennig and Gandenberger

brakes on the 'breakthrough'

brakes on the ‘breakthrough’

“This will be my last post on the (irksome) Birnbaum argument!” she says with her fingers (or perhaps toes) crossed. But really, really it is (at least until midnight 2013). In fact the following brief remarks are all said, more clearly, in my (old) PAPER , new paperMayo 2010Cox & Mayo 2011 (appendix), and in posts connected to this U-Phil: Blogging the likelihood principle, new summary 10/31/12*.

What’s the catch?

In my recent ‘Ton o’ Bricks” post,many readers were struck by the implausibility of letting the evidential interpretation of x’* be influenced by the properties of experiments known not to have produced x’*. Yet it is altogether common to be told that, should a sampling theorist try to block this, “unfortunately there is a catch” (Ghosh, Delampady, and Semanta 2006, 38): We would be forced to embrace the strong likelihood principle (SLP, or LP, for short), at least according to an infamous argument by Allan Birnbaum (who himself rejected the LP [i]).

It is not uncommon to see statistics texts argue that in frequentist theory one is faced with the following dilemma: either to deny the appropriateness of conditioning on the precision of the tool chosen by the toss of a coin, or else to embrace the strong likelihood principle, which entails that frequentist sampling distributions are irrelevant to inference once the data are obtained. This is a false dilemma. . . . The “dilemma” argument is therefore an illusion. (Cox and Mayo 2010, 298)

In my many detailed expositions, I have explained the source of the illusion and sleight of hand from a number of perspectives (I will not repeat references here). While I appreciate the care that Hennig and Gandenberger have taken in their U-Phils (and wish them all the luck in published outgrowths), it is clear to me that they are not hearing (or are unwittingly blocking) the scre-e-e-e-ching of the brakes!

No revolution, no breakthrough!

Berger and Wolpert, in their famous monograph The Likelihood Principle, identify the core issue:

The philosophical incompatibility of the LP and the frequentist viewpoint is clear, since the LP deals only with the observed x, while frequentist analyses involve averages over possible observations. . . . Enough direct conflicts have been . . . seen to justify viewing the LP as revolutionary from a frequentist perspective. (Berger and Wolpert 1988, 65-66)[ii]

If Birnbaum’s proof does not apply to a frequentist sampling theorist, then there is neither a revolution nor a breakthrough (as Savage called it). The SLP holds just for methodologies in which it holds . . . We are going in circles.

Block my counterexamples, please!

Since Birnbaum’s argument has stood for over fifty years, I’ve given it the maximal run for its money, and haven’t tried to block its premises, however questionable its key moves may appear. Despite such latitude, I’ve shown that the “proof” to the SLP conclusion will not wash, and I’m just a wee bit disappointed that Hennig and Gandenberger haven’t wrestled with my specific argument, or shown just where they think my debunking fails. What would this require?

Since the SLP is a universal generalization, it requires only a single counterexample to falsify it. In fact, every violation of the SLP within frequentist sampling theory, I show, is a counterexample to it! In other words, using the language from the definition of the SLP, the onus is on Birnbaum to show that for any x’* that is a member of an SLP pair (E’, E”) with given, different probability models f’, f”, that x’* and x”* should have the identical evidential import for an inference concerning parameter q–, on pain of facing “the catch” above, i.e., being forced to allow the import of data known to have come from E’ to be altered by unperformed experiments known not to have produced x’*.

If one is to release the breaks from my screeching halt, defenders of Birnbaum might try to show that the SLP counterexamples lead me to “the catch” as alleged. I have considered two well-known violations of the SLP. Can it be shown that a contradiction with the WCP or SP follows? I say no. Neither Hennig[ii] nor Gandenberger show otherwise.

In my tracing out of Birnbaum’s arguments, I strived to assume that he would not be giving us circular arguments. To say that “I can prove that your methodology must obey the SLP,” and then to set out to do so by declaring “Hey Presto! Assume sampling distributions are irrelevant (once the data are in hand),” is a neat trick, but it assumes what it purports to prove. All other interpretations are shown to be unsound.


[i] Birnbaum himself, soon after presenting his result, rejected the SLP. As Birnbaum puts it, ”the likelihood concept cannot be construed so as to allow useful appraisal, and thereby possible control, of probabilities of erroneous interpretations.” (Birnbaum 1969, p. 128.)

(We use LP and SLP synonymously here.)

[ii] Hennig initially concurred with me, but says a person convinced him to get back on the Birnbaum bus (even though Birnbaum got off it [i]).

Some other, related, posted discussions: Brakes on Breakthrough Part 1 (12/06/11)  & Part 2 (12/07/11); Don’t Birnbaumize that experiment (12/08/12); Midnight with Birnbaum re-blog (12/31/12). The initial call to this U-Phil, the extension, details here,  the post from my 28 Nov. seminar, (LSE), and the original post by Gandenberger,


Birnbaum, A. (1962), On the Foundations of Statistical Inference“, Journal of the American Statistical Association 57 (298), 269-306.

Savage, L. J., Barnard, G., Cornfield, J., Bross, I, Box, G., Good, I., Lindley, D., Clunies-Ross, C., Pratt, J., Levene, H., Goldman, T., Dempster, A., Kempthorne, O, and Birnbaum, A. (1962). On the foundations of statistical inference: “Discussion (of Birnbaum 1962)”,  Journal of the American Statistical Association 57 (298), 307-326.

Birbaum, A (1970). Statistical Methods in Scientific Inference  (letter to the editor). Nature 225, 1033.

Cox D. R. and Mayo. D. (2010). “Objectivity and Conditionality in Frequentist Inference in Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science (D Mayo & A. Spanos eds.), CUP 276-304.

…and if that’s not enough, search this blog.


Categories: Birnbaum Brakes, Likelihood Principle, Statistics

U-Phil: Ton o’ Bricks

ton_of_bricksby Deborah Mayo

Birnbaum’s argument for the SLP involves some equivocations that are at once subtle and blatant. The subtlety makes it hard to translate into symbolic logic (I only partially translated it). Philosophers should have a field day with this, and I should be hearing more reports that it has suddenly hit them between the eyes like a ton of bricks, to use a mixture metaphor. Here are the key bricks. References can be found in here, background to the U-Phil here..

Famous (mixture) weighing machine example and the WLP 

The main principle of evidence on which Birnbaum’s argument rests is the weak conditionality principle (WCP).  This principle, Birnbaum notes, follows not from mathematics alone but from intuitively plausible views of “evidential meaning.” To understand the interpretation of the WCP that gives it its plausible ring, we consider its development in “what is now usually called the ‘weighing machine example,’ which draws attention to the need for conditioning, at least in certain types of problems” (Reid 1992).

The basis for the WCP 

Example 3. Two measuring instruments of different precisions. We flip a fair coin to decide which of two instruments, E’ or E”, to use in observing a normally distributed random sample X to make inferences about mean q. Ehas a known variance of 10−4, while that of E” is known to be 104. The experiment is a mixture: E-mix. The fair coin or other randomizer may be characterized as observing an indicator statistic J, taking values 1 or 2 with probabilities .5, independent of the process under investigation. The full data indicates first the result of the coin toss, and then the measurement: (Ej, xj).[i]

The sample space of E-mix with components Ej, j = 1, 2, consists of the union of

{(j, x’): j = 0, possible values of X’} and {(j, x”): j = 1, possible values of X”}.

In testing a null hypothesis such as q = 0, the same x measurement would correspond to a much smaller p-value were it to have come from E′ than if it had come from E”: denote them as p′(x) and p′′(x), respectively. However, the overall significance level of the mixture, the convex combination of the p-value: [p′(x) + p′′(x)]/2, would give a misleading report of the precision or severity of the actual experimental measurement (See Cox and Mayo 2010, 296).

Suppose that we know we have observed a measurement from E” with its much larger variance:

The unconditional test says that we can assign this a higher level of significance than we ordinarily do, because if we were to repeat the experiment, we might sample some quite different distribution. But this fact seems irrelevant to the interpretation of an observation which we know came from a distribution [with the larger variance] (Cox 1958, 361).

In effect, an individual unlucky enough to use the imprecise tool gains a more informative assessment because he might have been lucky enough to use the more precise tool! (Birnbaum 1962, 491; Cox and Mayo 2010, 296). Once it is known whether E′ or E′′ has produced x, the p-value or other inferential assessment should be made conditional on the experiment actually run.

Weak Conditionality Principle (WCP): If a mixture experiment is performed, with components E’, E” determined by a randomizer (independent of the parameter of interest), then once (E’, x’) is known, inference should be based on E’ and its sampling distribution, not on the sampling distribution of the convex combination of E’ and E”.

Understanding the WCP

The WCP includes a prescription and a proscription for the proper evidential interpretation of x’, once it is known to have come from E’:

The evidential meaning of any outcome (E’, x’) of any experiment E having a mixture structure is the same as: the evidential meaning of the corresponding outcome x’ of the corresponding component experiment E’, ignoring otherwise the over-all structure of the original experiment E (Birnbaum 1962, 489 Eh and xh replaced with E’ and x’ for consistency).

While the WCP seems obvious enough, it is actually rife with equivocal potential. To avoid this, we spell out its three assertions.

First, it applies once we know which component of the mixture has been observed, and what the outcome was (Ej xj). (Birnbaum considers mixtures with just two components).

Second, there is the prescription about evidential equivalence. Once it is known that Ej has generated the data, given that our inference is about a parameter of Ej, inferences are appropriately drawn in terms of the distribution in Ej —the experiment known to have been performed.

Third, there is the proscription. In the case of informative inferences about the parameter of Ej our inference should not be influenced by whether the decision to perform Ej was determined by a coin flip or fixed all along. Misleading informative inferences might result from averaging over the convex combination of Ej and an experiment known not to have given rise to the data. The latter may be called the unconditional (sampling) distribution. ….


One crucial equivocation:

 Casella and R. Berger (2002) write:

The [weak] Conditionality principle simply says that if one of two experiments is randomly chosen and the chosen experiment is done, yielding data x, the information about q depends only on the experiment performed. . . . The fact that this experiment was performed, rather than some other, has not increased, decreased, or changed knowledge of q. (p. 293, emphasis added)

I have emphasized the last line in order to underscore a possible equivocation. Casella and Berger’s intended meaning is the correct claim:

(i) Given that it is known that measurement x’ is observed as a result of using tool E’, then it does not matter (and it need not be reported) whether or not E’ was chosen by a random toss (that might have resulted in using tool E”) or had been fixed all along.

Of course we do not know what measurement would have resulted had the unperformed measuring tool been used.

Compare (i) to a false and unintended reading:

(ii) If some measurement x is observed, then it does not matter (and it need not be reported) whether it came from a precise tool E’ or imprecise tool E”.

The idea of detaching x, and reporting that “x came from somewhere I know not where,” will not do. For one thing, we need to know the experiment in order to compute the sampling inference. For another, E’ and E” may be like our weighing procedures with very different precisions. It is analogous to being given the likelihood of the result in Example 1,(here) withholding whether it came from a negative binomial or a binomial.

Claim (i), by contrast, may well be warranted, not on purely mathematical grounds, but as the most appropriate way to report the precision of the result attained, as when the WCP applies. The essential difference in claim (i) is that it is known that (E, x’), enabling its inferential import to be determined.

The linguistic similarity of (i) and (ii) may explain the equivocation that vitiates the Birnbaum argument.

Now go back and skim 3 short pages of notes here, pp 11-14, and it should hit you like a ton of bricks!  If so, reward yourself with a double Elba Grease, else try again. Report your results in the comments.

Categories: Birnbaum Brakes, Statistics, strong likelihood principle, U-Phil

Midnight With Birnbaum-reblog

 Reblogging Dec. 31, 2011:

You know how in that recent movie, “Midnight in Paris,” the main character (I forget who plays it, I saw it on a plane) is a writer finishing a novel, and he steps into a cab that mysteriously picks him up at midnight and transports him back in time where he gets to run his work by such famous authors as Hemingway and Virginia Wolf?  He is impressed when his work earns their approval and he comes back each night in the same mysterious cab…Well, imagine an error statistical philosopher is picked up in a mysterious taxi at midnight (New Year’s Eve 2011 2012) and is taken back fifty years and, lo and behold, finds herself in the company of Allan Birnbaum.[i]

ERROR STATISTICIAN: It’s wonderful to meet you Professor Birnbaum; I’ve always been extremely impressed with the important impact your work has had on philosophical foundations of statistics.  I happen to be writing on your famous argument about the likelihood principle (LP).  (whispers: I can’t believe this!)

BIRNBAUM: Ultimately you know I rejected the LP as failing to control the error probabilities needed for my Confidence concept.

ERROR STATISTICIAN: Yes, but I actually don’t think your argument shows that the LP follows from such frequentist concepts as sufficiency S and the weak conditionality principle WLP.[ii]  Sorry,…I know it’s famous… Continue reading

Categories: Birnbaum Brakes, strong likelihood principle | Tags: , , ,

Don’t Birnbaumize that experiment my friend*–updated reblog

img_0196Our current topic, the strong likelihood principle (SLP), was recently mentioned by blogger Christian Robert (nice diagram). So ,since it’s Saturday night, and given the new law just passed in the state of Washington*, I’m going to reblog a post from Jan. 8, 2012, along with a new UPDATE (following a video we include as an experiment). The new material will be in red (slight differences in notation are explicated within links).

(A)  “It is not uncommon to see statistics texts argue that in frequentist theory one is faced with the following dilemma: either to deny the appropriateness of conditioning on the precision of the tool chosen by the toss of a coin[i], or else to embrace the strong likelihood principle which entails that frequentist sampling distributions are irrelevant to inference once the data are obtained.  This is a false dilemma … The ‘dilemma’ argument is therefore an illusion”. (Cox and Mayo 2010, p. 298)

The “illusion” stems from the sleight of hand I have been explaining in the Birnbaum argument—it starts with Birnbaumization. Continue reading

Categories: Birnbaum Brakes, Likelihood Principle, Statistics

Likelihood Links [for 28 Nov. Seminar and Current U-Phil]

old blogspot typewriterDear Reader: We just arrived in London[i][ii]. Jean Miller has put together some materials for Birnbaum LP aficionados in connection with my 28 November seminar. Great to have ready links to some of the early comments and replies by Birnbaum, Durbin, Kalbfleish and others, possibly of interest to those planning contributions to the current “U-Phil“.  I will try to make some remarks on Birnbaum’s 1970 letter to the editor tomorrow.

November 28th reading

Categories: Birnbaum Brakes, Likelihood Principle, U-Phil

Midnight With Birnbaum

You know how in that recent movie, “Midnight in Paris,” the main character (I forget who plays it, I saw it on a plane) is a writer finishing a novel, and he steps into a cab that mysteriously picks him up at midnight and transports him back in time where he gets to run his work by such famous authors as Hemingway and Virginia Wolf?  He is impressed when his work earns their approval and he comes back each night in the same mysterious cab…Well, imagine an error statistical philosopher is picked up in a mysterious taxi at midnight (new Year’s Eve 2011) and is taken back fifty years and, lo and behold, finds herself in the company of Allan Birnbaum.[i] Continue reading

Categories: Birnbaum Brakes | Tags: , , ,

Part II: Breaking Through the Breakthrough* (please start with Dec 6 post)

This is a first draft of part II of the presentation begun in the December 6 blog post.  This completes the proposed presentation. I expect errors, and I will be grateful for feedback! (NOTE: I did not need to actually rip a cover of EGEK to obtain this effect!)


You have observed y”, the .05 significant result from E”,the optional stopping rule, ending at n = 100.

Birnbaum claims he can show that you, as a frequentist error statistician, must grant that it is equivalent to having fixed n= 100 at the start (i.e., experiment E’)


The (strong) LikelihoodPrinciple (LP) is a universal conditional claim:

If two data sets y’and y” from experiments E’ and E” respectively, have likelihood functions which are functions of the same parameter(s) µ

and are proportional to each other, then y’ and y”should lead to identical inferential conclusions about µ Continue reading

Categories: Birnbaum Brakes, Likelihood Principle

Putting the Brakes on the Breakthrough Part I*

brakes on the 'breakthrough'

brakes on the ‘breakthrough’

I am going to post a FIRST draft (for a brief presentation next week in Madrid).  [I thank David Cox for the idea!] I expect errors, and I will be very grateful for feedback!  This is part I; part II will be posted tomorrow.  These posts may disappear once I’ve replaced them with a corrected draft.  I’ll then post the draft someplace.

If you wish to share queries/corrections please post as a comment or e-mail:  (ignore Greek symbols that are not showing correctly, I await fixes by Elbians.) Thanks much!

ONE: A Conversation between Sir David Cox and D. Mayo (June, 2011)

Toward the end of this exchange, the issue of the Likelihood Principle (LP)[1] arose:

COX: It is sometimes claimed that there are logical inconsistencies in frequentist theory, in particular surrounding the strong Likelihood Principle (LP). I know you have written about this, what is your view at the moment.

MAYO: What contradiction?
COX: Well, that frequentist theory does not obey the strong LP. Continue reading

Categories: Birnbaum Brakes, Likelihood Principle | Tags: ,

Blog at