11 years ago

For those who want to binge read the (Strong) Likelihood Principle in 2025

.

David Cox’s famous “weighing machine” example” from my last post is thought to have caused “a subtle earthquake” in foundations of statistics. It’s been 11 years since I published my Statistical Science article on this, Mayo (2014), which includes several commentators, but the issue is still mired in controversy. It’s generally dismissed as an annoying, mind-bending puzzle on which those in statistical foundations tend to hold absurdly strong opinions. Mostly it has been ignored. Yet I sense that 2026 is the year that people will return to it again. It’s at least touched upon in Roderick Little’s new book (pic below). This post gives some background, and collects the essential links that you would need if you want to delve into it. Many readers know that each year I return to the issue on New Year’s Eve…. But that’s tomorrow.

By the way, this is not part of our lesurely tour of SIST. In fact, the argument is not even in SIST, although the SLP (or LP) arises a lot. But if you want to go off the beaten track with me to the SLP conundrum, here’s your opportunity.

What’s it all about? An essential component of inference based on familiar frequentist notions: p-values, significance and confidence levels, is the relevant sampling distribution (hence the term sampling theory, or my preferred error statistics, as we get error probabilities from the sampling distribution). This feature results in violations of a principle known as the strong likelihood principle (SLP). To state the SLP roughly, it asserts that all the evidential import in the data (for parametric inference within a model) resides in the likelihoods. If accepted, it would render error probabilities irrelevant post data.

SLP (We often drop the “strong” and just call it the LP. The “weak” LP just boils down to sufficiency)

For any two experiments E1 and E2 with different probability models f1, f2, but with the same unknown parameter θ, if outcomes x* and y* (from E1 and E2 respectively) determine the same (i.e., proportional) likelihood function (f1(x*; θ) = cf2(y*; θ) for all θ), then x* and y* are inferentially equivalent (for an inference about θ).

(What differentiates the weak and the strong LP is that the weak refers to a single experiment.)

Violation of SLP:

Whenever outcomes x* and y* from experiments E1 and E2 with different probability models f1, f2, but with the same unknown parameter θ, and f1(x*; θ) = cf2(y*; θ) for all θ, and yet outcomes x* and y* have different implications for an inference about θ.

For an example of a SLP violation, E1 might be sampling from a Normal distribution with a fixed sample size n, and E2 the corresponding experiment that uses an optional stopping rule: keep sampling until you obtain a result 2 standard deviations away from a null hypothesis that θ = 0 (and for simplicity, a known standard deviation). When you do, stop and reject the point null (in 2-sided testing).

The SLP tells us  (in relation to the optional stopping rule) that once you have observed a 2-standard deviation result, there should be no evidential difference between its having arisen from experiment E1, where n was fixed, say, at 100, and experiment E2 where the stopping rule happens to stop at n = 100. For the error statistician, by contrast, there is a difference, and this constitutes a violation of the SLP.

———————-

Now for the surprising part: In Cox’s weighing machine example, recall, a coin is flipped to decide which of two experiments to perform.  David Cox (1958) proposes something called the Weak Conditionality Principle (WCP) to restrict the space of relevant repetitions for frequentist inference. The WCP says that once it is known which Ei produced the measurement, the assessment should be in terms of the properties of the particular Ei. Nothing could be more obvious.     

The surprising upshot of Allan Birnbaum’s (1962) argument is that the SLP appears to follow from applying the WCP in the case of mixture experiments, and so uncontroversial a principle as sufficiency (SP)–although even that has been shown to be optional to the argument, strictly speaking. Were this true, it would preclude the use of sampling distributions. J. Savage calls Birnbaum’s argument “a landmark in statistics” (see [i]).

Although his argument purports that [(WCP and SP) entails SLP], in fact data may violate the SLP while holding both the WCP and SP. Such cases also directly refute [WCP entails SLP].

Binge reading the Likelihood Principle.

If you’re keen to binge read the SLP–a way to break holiday/winter break doldrums–or if it comes up during 2025, I’ve pasted most of the early historical sources below. The argument is simple; showing what’s wrong with it took a long time.

My earliest treatment, via counterexample, is in Mayo (2010)–in an appendix to a paper I wrote with David Cox on objectivity and conditionality in frequentist inference.  But the treatment in the appendix doesn’t go far enough, so if you’re interested, it’s best to just check out Mayo (2014) in Statistical Science.[ii] An intermediate paper Mayo (2013) corresponds to a talk I presented at the JSM in 2013.

Interested readers may search this blog for quite a lot of discussion of the SLP including “U-Phils” (discussions by readers) (e.g., here, and here), and amusing notes (e.g., Don’t Birnbaumize that experiment my friend.

This conundrum is relevant to the very notion of “evidence”, blithely taken for granted in both statistics and philosophy. [iii] There’s no statistics involved, just logic and language.My 2014 paper shows the logical problem, but I still think that it will take an astute philosopher of language to adequately classify the linguistic fallacy being committed.

To have a list for binging, I’ve grouped some key readings below.

Classic Birnbaum Papers:

  • Birnbaum, A. (1962), “On the Foundations of Statistical Inference“, Journal of the American Statistical Association 57(298), 269-306.
  • Savage, L. J., Barnard, G., Cornfield, J., Bross, I, Box, G., Good, I., Lindley, D., Clunies-Ross, C., Pratt, J., Levene, H., Goldman, T., Dempster, A., Kempthorne, O, and Birnbaum, A. (1962). “Discussion on Birnbaum’s On the Foundations of Statistical Inference”, Journal of the American Statistical Association 57(298), 307-326.
  • Birnbaum, Allan (1969).” Concepts of Statistical Evidence“. In Ernest Nagel, Sidney Morgenbesser, Patrick Suppes & Morton Gabriel White (eds.), Philosophy, Science, and Method. New York: St. Martin’s Press. pp. 112–143.
  • Birnbaum, A (1970). Statistical Methods in Scientific Inference (letter to the editor). Nature 225, 1033.
  • Birnbaum, A (1972), “More on Concepts of Statistical Evidence“Journal of the American Statistical Association, 67(340), 858-861.

Note to Reader: If you look at the (1962) “discussion”, you can already see Birnbaum backtracking a bit, in response to Pratt’s comments.

Some additional early discussion papers:

Durbin:

There’s also a good discussion in Cox and Hinkley 1974.

Evans, Fraser, and Monette:

Kalbfleisch:

My discussions (also noted above):

Continue reading

Categories: 11 years ago, Likelihood Principle | Leave a comment

Blog at WordPress.com.