2018 will mark 60 years since the famous chestnut from Sir David Cox (1958). The example “is now usually called the ‘weighing machine example,’ which draws attention to the need for conditioning, at least in certain types of problems” (Reid 1992, p. 582). When I describe it, you’ll find it hard to believe many regard it as causing an earthquake in statistical foundations, unless you’re already steeped in these matters. A simple version: If half the time I reported my weight from a scale that’s always right, and half the time use a scale that gets it right with probability .5, would you say I’m right with probability ¾? Well, maybe. But suppose you *knew* that this measurement was made with the scale that’s right with probability .5? The overall error probability is scarcely relevant for giving the warrant of the particular measurement, *knowing* which scale was used. So what’s the earthquake? First a bit more on the chestnut. Here’s an excerpt from Cox and Mayo (2010, 295-8):

* Binging the Likelihood Principle.* The earthquake grows from the fact that it has long been thought that the (WCP) entails the (strong) Likelihood Principle (LP), based on a famous argument by Allan Birnbaum (1962). But the LP renders error probabilities irrelevant to parametric inference once the data are known. J. Savage calls Birnbaum’s argument “a landmark in statistics” (see [i]). I give a disproof of Birnbaum’s argument (via counterexample) in Mayo (2010), but later saw the need for a deeper argument which I give in Mayo (2014) in

*Statistical Science*.[ii] (There, among other subtleties, the WCP is put as a logical equivalence as intended.)

“It was the adoption of an unqualified equivalence formulation of conditionality, and related concepts, which led, in my 1962 paper, to the monster of the likelihood axiom,” (Birnbaum 1975, 263).

If you’re keen to binge on Birnbaum’s brainbuster, perhaps to break holiday/winter break doldrums, I’ve pasted most of the early historical sources below. The argument is simple; showing what’s wrong with it took a long time. You can also find quite a lot on the LP searching this blog (including posts by readers); it was a main topic for the first few years of this blog. You might start with a summary post (based on slides) here, or an intermediate paper Mayo (2013) I presented at the JSM. In it I ask:

Does it matter? On the face of it, current day uses of sampling theory statistics do not seem in need of going back 50 years to tackle a foundational argument. This may be so, but only if it is correct to assume that the Birnbaum argument must be flawed somewhere.(Mayo 2013, p.441)

It is.

[i] Savage on Birnbaum: “This paper is a landmark in statistics. . . . I, myself, like other Bayesian statisticians, have been convinced of the truth of the likelihood principle for a long time. Its consequences for statistics are very great. . . . [T]his paper is really momentous in the history of statistics. It would be hard to point to even a handful of comparable events. …once the likelihood principle is widely recognized, people will not long stop at that halfway house but will go forward and accept the implications of personalistic probability for statistics” (Savage 1962, 307-308).

The argument purports to follow from principles frequentist error statisticians accept.

[ii] The link includes comments on my paper by Bjornstad, Dawid, Evans, Fraser, Hannig, and Martin and Liu, and my rejoinder.

Birnbaum Papers:

- Birnbaum, A. (1962), “On the Foundations of Statistical Inference“,
*Journal of the American Statistical Association*57(298), 269-306. - Savage, L. J., Barnard, G., Cornfield, J., Bross, I, Box, G., Good, I., Lindley, D., Clunies-Ross, C., Pratt, J., Levene, H., Goldman, T., Dempster, A., Kempthorne, O, and Birnbaum, A. (1962). “Discussion on Birnbaum’s On the Foundations of Statistical Inference”,
*Journal of the American Statistical Association*57(298), 307-326. - Birnbaum, A (1970). Statistical Methods in Scientific Inference (letter to the editor). Nature 225, 1033.
- Birnbaum, A (1972), “More on Concepts of Statistical Evidence“,
*Journal of the American Statistical Association*, 67(340), 858-861.

Some additional early discussion papers:

Durbin:

- Durbin, J. (1970), “On Birnbaum’s Theorem on the Relation Between Sufficiency, Conditionality and Likelihood”,
*Journal of the American Statistical Association*, Vol. 65, No. 329 (Mar., 1970), pp. 395-398. - Savage, L. J., (1970), “Comments on a Weakened Principle of Conditionality”,
*Journal of the American Statistical Association*, Vol. 65, No. 329 (Mar., 1970), pp. 399-401. - Birnbaum, A. (1970), “On Durbin’s Modified Principle of Conditionality”,
*Journal of the American Statistical Association*, Vol. 65, No. 329 (Mar., 1970), pp. 402-403.

Evans, Fraser, and Monette:

- Evans, M., Fraser, D.A., and Monette, G., (1986), “On Principles and Arguments to Likelihood.”
*The Canadian Journal of Statistics*14: 181-199.

Kalbfleisch:

- Kalbfleisch, J. D. (1975), “Sufficiency and Conditionality”,
*Biometrika*, Vol. 62, No. 2 (Aug., 1975), pp. 251-259. - Barnard, G. A., (1975), “Comments on Paper by J. D. Kalbfleisch”,
*Biometrika*, Vol. 62, No. 2 (Aug., 1975), pp. 260-261. - Barndorff-Nielsen, O. (1975), “Comments on Paper by J. D. Kalbfleisch”,
*Biometrika*, Vol. 62, No. 2 (Aug., 1975), pp. 261-262. - Birnbaum, A. (1975), “Comments on Paper by J. D. Kalbfleisch”,
*Biometrika*, Vol. 62, No. 2 (Aug., 1975), pp. 262-264. - Kalbfleisch, J. D. (1975), “Reply to Comments”,
*Biometrika*, Vol. 62, No. 2 (Aug., 1975), p. 268.

**References for the Blogpost:**

Birnbaum, A. (1962), “On the Foundations of Statistical Inference“, *Journal of the American Statistical Association* 57(298), 269-306.

Birnbaum, A. (1975). *Comments on Paper by J. D. Kalbfleisch*. Biometrika, 62 (2), 262–264.

Cox, D. R. (1958), “Some problems connected with statistical inference“, The Annals of Mathematical Statistics, 29, 357-372.

Cox D. R. and Mayo. D. G. (2010). “Objectivity and Conditionality in Frequentist Inference” in *Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science* (D Mayo and A. Spanos eds.), Cambridge: Cambridge University Press: 276-304.

Mayo, D. G. (2010). “An Error in the Argument from Conditionality and Sufficiency to the Likelihood Principle” in *Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science* (D Mayo and A. Spanos eds.), Cambridge: Cambridge University Press: 305-14.

Mayo, D. G. (2013) “Presented Version: On the Birnbaum Argument for the Strong Likelihood Principle”, in *JSM Proceedings*, Section on Bayesian Statistical Science. Alexandria, VA: American Statistical Association: 440-453.

Mayo, D. G. (2014). Mayo paper: “On the Birnbaum Argument for the Strong Likelihood Principle,” Paper with discussion and Mayo rejoinder: *Statistical Science** *29(2) pp. 227-239, 261-266*.*

Reid, N. (1992). Introduction to Fraser (1966) structural probability and a generalization. In Breakthroughs in Statistics (S. Kotz and N. L. Johnson, eds.) 579–586. Springer Series in Statistics. Springer, New York.

Savage, L. J. (1962). Discussion on a paper by A. Birnbaum [On the foundations of statistical inference]. *Journal of the American Statistical Association*, 57, 307– 308.

Writers whose earlier editions of stat texts blithely declare that Birnbaum proved the LP should seriously consider changing this in future editions, or in errata. The simple proposed “proofs” are not sound.

I am always puzzled by this example, because it does not seem to me to be obvious what conditioning entails. Suppose that the weighing machine can be taken as an analogy for statistical testing and actually what will happen is that you will carry out with probability 1/2 either a test based on a small sample (case A) or a large one (case B) and that you have a specific alternative in mind. (This, it has to be admitted is most unrealistic but let us accept it for argument’s sake.) Suppose, however, that you do accept that you must take account of the fact that you either have case A or case B. What does conditioning involve? This is where I have the problem. Here are two possible solutions

1) Neyman-Pearson buff. You must control the type one error in each case. You set the type I error to the same value alpha in each case and your power will vary, being higher in case B than in case A. This of course means that the likelihood ratio for the critical value will change from case to case.

2) Likelihood fan. You must set the critical value of the likelihood ratio to the same value lambda, in each case. If worried about type I error rate, you can choose lambda so that the average value of the type I error rate over the two cases is alpha.

Each of these strategies takes account of what is known but they are not the same. Which is right? It seems to me that depending on one’s statistical philosophy one could choose one or the other. Hence saying ‘i must obviously condition’ is only half the story. Yes, you must. Then what?

Yes, there are different ways to proceed, and as Lehmann says, there are circumstances where it makes sense to retain the unconditional formulation. If you take Birnbaum’s set-up, as Cox pointed out long ago, it appears you’d have to average over all possible likelihood principle pairs for any outcome. What mainly interests me is that point, and the slippery logical trickery in Birnbaum’s argument that has made it appear that he’s talking about a mixture of 2 weighing machines. It doesn’t go through.

OK I must check what Lehman says. Note, however, that I am arguing that either approach is conditional and that neither is marginal. Something is changed conditional on the information. The issue is whether the type I error rate is kept constant and the LR changes or vice versa.