Likelihood Principle

BREAKING THE LAW! (of likelihood): to keep their fit measures in line (A), (B 2nd)



1.An Assumed Law of Statistical Evidence (law of likelihood)

Nearly all critical discussions of frequentist error statistical inference (significance tests, confidence intervals, p- values, power, etc.) start with the following general assumption about the nature of inductive evidence or support:

Data x are better evidence for hypothesis H1 than for H0 if x are more probable under H1 than under H0.

Ian Hacking (1965) called this the logic of support: x supports hypotheses H1 more than H0 if H1 is more likely, given x than is H0:

Pr(x; H1) > Pr(x; H0).

[With likelihoods, the data x are fixed, the hypotheses vary.]*


x is evidence for H1 over H0 if the likelihood ratio LR (H1 over H0 ) is greater than 1.

It is given in other ways besides, but it’s the same general idea. (Some will take the LR as actually quantifying the support, others leave it qualitative.)

In terms of rejection:

“An hypothesis should be rejected if and only if there is some rival hypothesis much better supported [i.e., much more likely] than it is.” (Hacking 1965, 89)

2. Barnard (British Journal of Philosophy of Science )

But this “law” will immediately be seen to fail on our minimal severity requirement. Hunting for an impressive fit, or trying and trying again, it’s easy to find a rival hypothesis H1 much better “supported” than H0 even when H0 is true. Or, as Barnard (1972) puts it, “there always is such a rival hypothesis, viz. that things just had to turn out the way they actually did” (1972 p. 129).  H0: the coin is fair, gets a small likelihood (.5)k given k tosses of a coin, while H1: the probability of heads is 1 just on those tosses that yield a head, renders the sequence of k outcomes maximally likely. This is an example of Barnard’s “things just had to turn out as they did”. Or, to use an example with P-values: a statistically significant difference, being improbable under the null H0 , will afford high likelihood to any number of explanations that fit the data well.

3.Breaking the law (of likelihood) by going to the “second,” error statistical level:

How does it fail our severity requirement? First look at what the frequentist error statistician must always do to critique an inference: she must consider the capability of the inference method that purports to provide evidence for a claim. She goes to a higher level or metalevel, as it were. In this case, the likelihood ratio plays the role of the needed statistic d(X). To put it informally, she asks:

What’s the probability the method would yield an LR disfavoring H0 compared to some alternative H1  even if H0 is true?

What’s the probability of so small a likelihood for H0 compared to H1, even if H0 adequately describes the data generating procedure? As Pearson and Neyman put it:

“[I]n order to fix a limit between ‘small’ and ‘large’ values of LR we must know how often such values appear when we deal with a true hypothesis. That is to say we must have knowledge of the chance of obtaining [so small a likelihood ratio] in the case where the hypothesis tested [H0 ] is true” (Pearson and Neyman 1930, 106).

Looking at “how often such values appear” of course turns on the sampling distribution of the LR viewed as a statistic. That’s why frequentist error statistical accounts are called sampling theory accounts. This requires considering other values that could have occurred, not just the one you got.

But this this breaks the law of likelihood and so is taboo for the likelihoodist! (Likewise for anyone holding the Likelihood Principle[i].)

Viewing the sampling distribution as taboo (once the data are given) is puzzling in the extreme[ii]. How can it be desirable to block out information about how the data were generated and the hypotheses specified? I fail to see how anyone can evaluate an inference from data x to a claim C without learning about the capabilities of the method, through the relevant sampling distribution. Readers of this blog know my favorite example to demonstrate the lack of error control if you look only at likelihoods: the case of optional stopping. (Keep sampling until you get a nominal p value of .05 against a 0 null hypothesis in two-sided Normal testing of the mean. You can be wrong with maximal probability.)

Just such examples, where the alternative is not a point value, led Barnard to abandon (or greatly restrict) the Likelihood Principle. Interestingly, in raising these criticisms of likelihood, Barnard is reviewing Ian Hacking’s 1965 book: The Logic of Statistical Inference. Only thing is, by the time of this 1972 review, Hacking had given it up as well! In fact, in the pages immediately following Barnard’s review of Hacking, is Hacking reviewing A.F. Edwards’ book Likelihood (1972) wherein Hacking explains why he’s thrown his own likelihood rule of support overboard.

4.Hacking (also BJPS)

A classic case is the normal distribution and a single observation. Reluctantly we will grant Edwards that the observation x is the best supported estimate of the unknown mean. But the hypothesis about the variance, with highest likelihood, is the assumption that there is no variance, which strikes us as monstrous. .. we must concede that as prior information we take for granted the variance is at least w. But even this will not do, for the best supported view on the variance is then that it is exactly w.

For a less artificial example, take the ‘tram-car’ or ‘tank’ problem We capture enemy tanks at random and note the serial numbers on their engines. We know the serial numbers start at 0001. We capture a tank number 2176. How many tanks did the enemy make? On the likelihood analysis, the best supported guess is: 2176. Now one can defend this remarkable result by saying that it does not follow that we should estimate the actual number as 2176, only that comparing individual numbers, 2176 is better supported than any larger figure. My worry is deeper. Let us compare the relative likelihood of the two hypotheses, 2176 and 3000. Now pass to a situation where we are measuring, say, widths of a grating, in which error has a normal distribution with known variance; we can devise data and a pair of hypotheses about the mean which will have the same log-likelihood ratio. I have no inclination to say that the relative support in the tank case is ‘exactly the same as’ that in the normal distribution case, even though the likelihood ratios are the same. Hence even on those increasingly rare days when I will rank hypotheses in order of their likelihoods, I cannot take the actual log-likelihood number as an objective measure of anything. (Hacking 1972, 136-137).

Hacking appears even more concerned with the fact that likelihood ratios do not enjoy a stable evidential meaning or calibration, than the lack of error control in likelihoodist accounts. But Hacking was still assuming the latter must be cashed out in terms of long run error performance[iii] as opposed to stringency of test.

I say: a method that makes it easy to declare evidence against hypotheses erroneously gives an unwarranted inference each time; a method that fails to directly pick up on optional stopping, data dredging, cherry picking, multiple testing or any of the other gambits that alter the capabilities of tests to avoid mistaken inferences are poor methods, but not because of their behavior in the long-run. They license unwarranted or questionable inferences in each and every application.This is so, I aver, even if we happen to know, through other means, that their inferred claim C is correct.

5.Three ways likelihoods arise in inference. Aug. 31 note at end of para.

Likelihoods are fundamental in all statistical inference accounts. One might separate how they arise in three groups (acknowledging divisions within each)

(1) likelihoods only (pure likelihoodist)

(2) likelihoods + priors (Bayesian)

(3) likelihoods + error probabilities based on sampling distributions (error statistics, sampling theory

Only the error statistician (3) requires breaking the likelihood law.[See note.] You can feed us fit measures from (1) and (2), and we will do the same thing: ask about the probability of so good (or poor) a fit between data and some claim C, even if C is false (true). The answer will be based on the sampling distribution of the relevant statistic, computed under the falsity of C, or discrepancies from what C asserts).[iv]

Aug 31 note: 

If someone wanted to describe the addition of the priors under rubric (2) as tantamount to “breaking the likelihood law”, as opposed to merely requiring it to be supplemented, nothing whatever changes in the point of this post. (It would seem to introduce idiosyncrasies in the usual formulation–but these are not germane to my post.) My sentence, in fact, might well have been “Only the error statistician (3) requires breaking the likelihood law and the likelihood principle (by dint of requiring considerations of the sampling distribution to obtain the evidential import of the data).




Installment (B): an ad hoc clarificatory note, prompted by comments from an anonymous fan

6. Of tests and comparative support measures

The statements of “the” law of likelihood, and likelihood support logics are not all precisely identical. Some accounts are qualitative, merely indicating prima facie increased support; others will devise quantitative measures of support based on likelihoods. (There are at least 10 of them we covered in our recent seminar, maybe more.) Some will try out corresponding “tests” others not. One needn’t have anything like a test or a “rejection rule” to be a likelihoodist. I mentioned the construal in terms of tests because it is in the sentence just before the one I quote from Barnard, and wanted to be true to what he had just said about Hacking’s 1965 book.

Remember the topic of my post concerns criticisms of error statistical methods, and a principle (or “law”) of evidence used in those criticisms. (If you reject that principle, then presumably you wouldn’t use it to criticize error statistical methods, so we have no disagreement on this.) A clear rationale for connecting tests of hypotheses—be they Fisherian or N-P style—and logics of likelihood is to mount criticisms: to explain what’s wrong with those (Fisherian or N-P) tests, and how they may be cured of their problems.

Hacking lays out an impressive argument that all that is sensible in N-P likelihood ratio tests are captured by his conception of likelihood tests (the one he advanced back in 1965) while all the (apparently) counterintuitive parts are jettisoned. Now that I’ve access to my NYC library, I can quote the portion to which Barnard is alluding in his review of Hacking.

“Our theory of support leads directly to the theory of testing suggested in the last chapter [VI]. An hypothesis should be rejected if and only if there is some rival hypothesis much better supported than it is. Support has already been analysed in terms of ratios of likelihoods. But what shall serve as ‘much better supported’? For the present I leave this in abeyance, and speak merely of tests of different stringency. With each test will be associated a critical ratio. The greater the critical ratio, the more stringent the test. Roughly speaking hypothesis h will be rejected in favour of rival i at critical level alpha, just if the likelihood ratio of i to h exceeds alpha.” (Hacking 1965 p.89)

I don’t want to pursue this discussion of Hacking here. To repeat, my post concerns criticisms of error statistical methods. A foundational critique of a method of inference depends on holding another view or principle or method of inference. This post is an offshoot of the recent  posts here and here (7/14/14 and 8/17/14)..

Critiques in those posts are based on assuming that it is fair, reasonable, obvious or what have you, to criticize the way p-values arise in inference by means of a different view of inference. (I allude here to genuine or “audited” p-values, not mere nominal or computed p-values.) The p-value, it is reasoned, should be close to either a posterior probability (in the null hypothesis) or a likelihood ratio (or Bayes ratio). Ways to “fix” p-values are proposed to get them closer to these other measures. I don’t think there was anything controversial about this being the basic goal, not just of the particular papers we looked at, but mountains of papers that have been written and are being written this very moment.

I may continue with my intended follow-up (Part C)

*Note; I am not sure whether the powers that be are allowing us to say “data x is” nowadays–I read something about this, maybe it was by Pinker. Can somebody please ask Stephen Pinker for me? Thanks.

[i] Please search this blog for quite a lot on the likelihood principle and the strong likelihood principle.

[ii]I would say this even if we knew the model was adequate. Likelihood principlers may regard using the sampling distribution to test the model as legitimate.

[iii]Perhaps he still is, I don’t mean to saddle him with my testing construal of error probabilities at all. (Some hints of a shift exists in his 1980 article in the Braithwaite volume.)

[iv] This delineation comes from Cox and Hinkley, but I don’t have it here.


Barnard, G. (1972). Review of ‘The Logic of Statistical Inference’ by I. HackingBrit. J. Phil.Sci., 23(2): 123-132.

Hacking, I. (1965). Logic of Statistical Inference. Cambridge: CUP.

Hacking, I. (1972). “Review of Likelihood. An Account of the Statistical Concept of Likelihood and Its Application to Scientific Inference by A. F. Edwards,” Brit. J. Phil.Sci., 23(2): 132-137.

Hacking, I. (1980). “The Theory of Probable Inference: Neyman, Peirce and Braithwaite.” In D. H. Mellor (ed.), Science, belief and behavior: Essays in honor of R.B. Braithwaite.  141-160. Cambridge: CUP.

Pearson, E.S. & Neyman, J. (1930). On the problem of two samples.Joint Statistical Papers by J. Neyman & E.S. Pearson, 99-115 (Berkeley: U. of Calif. Press). First published in  Bul. Acad. Pol.Sci. 73-96.


Categories: highly probable vs highly probed, law of likelihood, Likelihood Principle, Statistics | 35 Comments

Allan Birnbaum, Philosophical Error Statistician: 27 May 1923 – 1 July 1976

27 May 1923-   1 July 1976

Today is Allan Birnbaum’s Birthday. Birnbaum’s (1962) classic “On the Foundations of Statistical Inference” is in Breakthroughs in Statistics (volume I 1993).  I’ve a hunch that Birnbaum would have liked my rejoinder to discussants of my forthcoming paper (Statistical Science): Bjornstad, Dawid, Evans, Fraser, Hannig, and Martin and Liu. I hadn’t realized until recently that all of this is up under “future papers” here [1]. You can find the rejoinder: STS1404-004RA0-2. That takes away some of the surprise of having it all come out at once (and in final form). For those unfamiliar with the argument, at the end of this entry are slides from a recent, entirely informal, talk that I never posted, as well as some links from this blog. Happy Birthday Birnbaum! Continue reading

Categories: Birnbaum, Birnbaum Brakes, Likelihood Principle, Statistics | Leave a comment

Barnard’s Birthday: background, likelihood principle, intentions

G.A. Barnard: 23 Sept.1915 – 9 Aug.2002

Reblog (year ago) : G.A. Barnard’s birthday is today, so here’s a snippet of his discussion with Savage (1962) (link below [i]) that connects to some earlier issues: stopping rules, likelihood principle, and background information here and here (at least of one type). (A few other Barnard links on this blog are below* .) Happy Birthday George!

Barnard: I have been made to think further about this issue of the stopping rule since I first suggested that the stopping rule was irrelevant (Barnard 1947a,b). This conclusion does not follow only from the subjective theory of probability; it seems to me that the stopping rule is irrelevant in certain circumstances.  Since 1947 I have had the great benefit of a long correspondence—not many letters because they were not very frequent, but it went on over a long time—with Professor Bartlett, as a result of which I am considerably clearer than I was before. My feeling is that, as I indicated [on p. 42], we meet with two sorts of situation in applying statistics to data One is where we want to have a single hypothesis with which to confront the data. Do they agree with this hypothesis or do they not? Now in that situation you cannot apply Bayes’s theorem because you have not got any alternatives to think about and specify—not yet. I do not say they are not specifiable—they are not specified yet. And in that situation it seems to me the stopping rule is relevant.

In particular, suppose somebody sets out to demonstrate the existence of extrasensory perception and says ‘I am going to go on until I get a one in ten thousand significance level’. Knowing that this is what he is setting out to do would lead you to adopt a different test criterion. What you would look at would not be the ratio of successes obtained, but how long it took him to obtain it. And you would have a very simple test of significance which said if it took you so long to achieve this increase in the score above the chance fraction, this is not at all strong evidence for E.S.P., it is very weak evidence. And the reversing of the choice of test criteria would I think overcome the difficulty.

This is the answer to the point Professor Savage makes; he says why use one method when you have vague knowledge, when you would use a quite different method when you have precise knowledge. It seem to me the answer is that you would use one method when you have precisely determined alternatives, with which you want to compare a given hypothesis, and you use another method when you do not have these alternatives.

Savage: May I digress to say publicly that I learned the stopping-rule principle from professor Barnard, in conversation in the summer of 1952. Frankly I then thought it a scandal that anyone in the profession could advance an idea so patently wrong, even as today I can scarcely believe that some people resist an idea so patently right. I am particularly surprised to hear Professor Barnard say today that the stopping rule is irrelevant in certain circumstances only, for the argument he first gave in favour of the principle seems quite unaffected by the distinctions just discussed. The argument then was this: The design of a sequential experiment is, in the last analysis, what the experimenter actually intended to do. His intention is locked up inside his head and cannot be known to those who have to judge the experiment. Never having been comfortable with that argument, I am not advancing it myself. But if Professor Barnard still accepts it, how can he conclude that the stopping-rule principle is only sometimes valid? (emphasis added) Continue reading

Categories: Background knowledge, Likelihood Principle, phil/history of stat, Philosophy of Statistics | Leave a comment

Gandenberger: How to Do Philosophy That Matters (guest post)

greg picGreg Gandenberger                             
Philosopher of Science
University of Pittsburgh                                                                                    468px-Karl_Popper

Genuine philosophical problems are always rooted in urgent problems outside philosophy,
and they die if these roots decay
Karl Popper (1963, 72)

My concern in this post is how we philosophers can use our skills to do work that matters to people both inside and outside of philosophy.

Philosophers are highly skilled at conceptual analysis, in which one takes an interesting but unclear concept and attempts to state precisely when it applies and when it doesn’t.

What is the point of this activity? In many cases, this question has no satisfactory answer. Conceptual analysis becomes an end in itself, and philosophical debates become fruitless arguments about words. The pleasure we philosophers take in such arguments hardly warrants scarce government and university resources. It does provide good training in critical thinking, but so do many other activities that are also immediately useful, such as doing science and programming computers.

Conceptual analysis does not have to be pointless. It is often prompted by a real-world problem. In Plato’s Euthyphro, for instance, the character Euthyphro thought that piety required him to prosecute his father for murder. His family thought on the contrary that for a son to prosecute his own father was the height of impiety. In this situation, the question “what is piety?” took on great urgency. It also had great urgency for Socrates, who was awaiting trial for corrupting the youth of Athens.

In general, conceptual analysis often begins as a response to some question about how we ought to regulate our beliefs or actions. It can be a fruitful activity as long as the questions that prompted it are kept in view. It tends to degenerate into merely verbal disputes when it becomes an end in itself.

The kind of goal-oriented view of conceptual analysis I aim to articulate and promote is not teleosemantics: it is a view about how philosophy should be done rather than a theory of meaning. It is consistent with Carnap’s notion of explication (one of the desiderata of which is fruitfulness) (Carnap 1963, 5), but in practice Carnapian explication seems to devolve into idle word games just as easily as conceptual analysis. Our overriding goal should not be fidelity to intuitions, precision, or systematicity, but usefulness.

How I Became Suspicious of Conceptual Analysis

When I began working on proofs of the Likelihood Principle, I assumed that following my intuitions about the concept of “evidential equivalence” would lead to insights about how science should be done. Birnbaum’s proof showed me that my intuitions entail the Likelihood Principle, which frequentist methods violate. Voila! Voila! Scientists shouldn’t use frequentist methods. All that remained to be done was to fortify Birnbaum’s proof, as I do in “A New Proof of the Likelihood Principle” by defending it against objections and buttressing it with an alternative proof. [Editor: For a number of related materials on this blog see Mayo’s JSM presentation, and note [i].]

After working on this topic for some time, I realized that I was making simplistic assumptions about the relationship between conceptual intuitions and methodological norms. At most, a proof of the Likelihood Principle can show you that frequentist methods run contrary to your intuitions about evidential equivalence. Even if those intuitions are true, it does not follow immediately that scientists should not use frequentist methods. The ultimate aim of science, presumably, is not to respect evidential equivalence but (roughly) to learn about the world and make it better. The demand that scientists use methods that respect evidential equivalence is warranted only insofar as it is conducive to achieving those ends. Birnbaum’s proof says nothing about that issue.

  • In general, a conceptual analysis–even of a normatively freighted term like “evidence”–is never enough by itself to justify a normative claim. The questions that ultimately matter are not about “what we mean” when we use particular words and phrases, but rather about what our aims are and how we can best achieve them.

How to Do Conceptual Analysis Teleologically

This is not to say that my work on the Likelihood Principle or conceptual analysis in general is without value. But it is nothing more than a kind of careful lexicography. This kind of work is potentially useful for clarifying normative claims with the aim of assessing and possibly implementing them. To do work that matters, philosophers engaged in conceptual analysis need to take enough interest in the assessment and implementation stages to do their conceptual analysis with the relevant normative claims in mind.

So what does this kind of teleological (goal-oriented) conceptual analysis look like?

It can involve personally following through on the process of assessing and implementing the relevant norms. For example, philosophers at Carnegie Mellon University working on causation have not only provided a kind of analysis of the concept of causation but also developed algorithms for causal discovery, proved theorems about those algorithms, and applied those algorithms to contemporary scientific problems (see e.g. Spirtes et al. 2000).

I have great respect for this work. But doing conceptual analysis does not have to mean going so far outside the traditional bounds of philosophy. A perfect example is James Woodward’s related work on causal explanation, which he describes as follows (2003, 7-8, original emphasis):

My project…makes recommendations about what one ought to mean by various causal and explanatory claims, rather than just attempting to describe how we use those claims. It recognizes that causal and explanatory claims sometimes are confused, unclear, and ambiguous and suggests how those limitations might be addressed…. we introduce concepts…and characterize them in certain ways…because we want to do things with them…. Concepts can be well or badly designed for such purposes, and we can evaluate them accordingly.

Woodward keeps his eye on what the notion of causation is for, namely distinguishing between relationships that do and relationships that do not remain invariant under interventions. This distinction is enormously important because only relationships that remain invariant under interventions provide “handles” we can use to change the world.

Here are some lessons about teleological conceptual analysis that we can take from Woodward’s work. (I’m sure this list could be expanded.)

  1. Teleological conceptual analysis puts us in charge. In his wonderful presidential address at the 2012 meeting of the Philosophy of Science Association, Woodward ended a litany of metaphysical arguments against regarding mental events as causes by asking “Who’s in charge here?” There is no ideal form of Causation to which we must answer. We are free to decide to use “causation” and related words in the ways that best serve our interests.
  2. Teleological conceptual analysis can be revisionary. If ordinary usage is not optimal, we can change it.
  3. The product of a teleological conceptual analysis need not be unique. Some philosophers reject Woodward’s account because they regard causation as a process rather than as a relationship among variables. But why do we need to choose? There could just be two different notions of causation. Woodward’s account captures one notion that is very important in science and everyday life. If it captures all of the causal notions that are important, then so much the better. But this kind of comprehensiveness is not essential.
  4. Teleological conceptual analysis can be non-reductive. Woodward characterizes causal relations as (roughly) correlation relations that are invariant under certain kinds of interventions. But the notion of an intervention is itself causal. Woodward’s account is not circular because it characterizes what it means for a causal relationship to hold between two variables in terms of a different causal processes involving different sets of variables. But it is non-reductive in the sense that does not allow us to replace causal claims with equivalent non-causal claims (as, e.g., counterfactual, regularity, probabilistic, and process theories purport to do). This fact is a problem if one’s primary concern is to reduce one’s ultimate metaphysical commitments, but it is not necessarily a problem if one’s primary concern is to improve our ability to assess and use causal claims.


Philosophers rarely succeed in capturing all of our intuitions about an important informal concept. Even if they did succeed, they would have more work to do in justifying any norms that invoke that concept. Conceptual analysis can be a first step toward doing philosophy that matters, but it needs to be undertaken with the relevant normative claims in mind.

Question: What are your best examples of philosophy that matters? What can we learn from them?


  • Birnbaum, Allan. “On the Foundations of Statistical Inference.” Journal of the American Statistical Association 57.298 (1962): 269-306.
  • Carnap, Rudolf. Logical Foundations of Probability. U of Chicago Press, 1963.
  • Gandenberger, Greg. “A New Proof of the Likelihood Principle.” The British Journal for the Philosophy of Science (forthcoming).
  • Plato. Euthyphro
  • Popper, Karl. Conjectures and Refutations. London: Routledge & Kegan Paul, 1963.
  • Spirtes, Peter, Clark Glymour, and Richard Scheines. Causation, Prediction, and Search. Vol. 81. The MIT Press, 2000.
  • Woodward, James. Making Things Happen: A Theory of Causal Explanation. Oxford University Press, 2003.

[i] Earlier posts are here and here. Some U-Phils are here, here, and here. For some amusing notes (e.g., Don’t Birnbaumize that experiment my friend, and Midnight with Birnbaum).

Some related papers:

  • Cox D. R. and Mayo. D. G. (2010). “Objectivity and Conditionality in Frequentist Inference” in Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science (D Mayo and A. Spanos eds.), Cambridge: Cambridge University Press: 276-304.
Categories: Birnbaum Brakes, Likelihood Principle, StatSci meets PhilSci | 9 Comments

A.Birnbaum: Statistical Methods in Scientific Inference

Birnbaum: born May 27, 1923

Today is (statistician) Allan Birnbaum’s birthday. He lived to be only 53 [i]. From the perspective of philosophy of statistics and philosophy of science, Birnbaum is best known for his work on likelihood, the Likelihood Principle [ii], and for his attempts to blend concepts of likelihood with error probability ideas to obtain what he called “concepts of statistical evidence”. Failing to find adequate concepts of statistical evidence, Birnbaum called for joining the work of “interested statisticians, scientific workers and philosophers and historians of science”–an idea I would heartily endorse!  While known for attempts to argue that the (strong) Likelihood Principle followed from sufficiency and conditionality principles, a few years after publishing this result, he seems to have turned away from it, perhaps discovering gaps in his argument.

NATURE VOL. 225 MARCH 14, 1970 (1033)


Statistical methods in Scientific Inference

 It is regrettable that Edwards’s interesting article[1], supporting the likelihood and prior likelihood concepts, did not point out the specific criticisms of likelihood (and Bayesian) concepts that seem to dissuade most theoretical and applied statisticians from adopting them. As one whom Edwards particularly credits with having ‘analysed in depth…some attractive properties” of the likelihood concept, I must point out that I am not now among the ‘modern exponents” of the likelihood concept. Further, after suggesting that the notion of prior likelihood was plausible as an extension or analogue of the usual likelihood concept (ref.2, p. 200)[2], I have pursued the matter through further consideration and rejection of both the likelihood concept and various proposed formalizations of prior information and opinion (including prior likelihood).  I regret not having expressed my developing views in any formal publication between 1962 and late 1969 (just after ref. 1 appeared). My present views have now, however, been published in an expository but critical article (ref. 3, see also ref. 4)[3] [4], and so my comments here will be restricted to several specific points that Edwards raised.

 If there has been ‘one rock in a shifting scene’ or general statistical thinking and practice in recent decades, it has not been the likelihood concept, as Edwards suggests, but rather the concept by which confidence limits and hypothesis tests are usually interpreted, which we may call the confidence concept of statistical evidence. This concept is not part of the Neyman-Pearson theory of tests and confidence region estimation, which denies any role to concepts of statistical evidence, as Neyman consistently insists. The confidence concept takes from the Neyman-Pearson approach techniques for systematically appraising and bounding the probabilities (under respective hypotheses) of seriously misleading interpretations of data. (The absence of a comparable property in the likelihood and Bayesian approaches is widely regarded as a decisive inadequacy.) The confidence concept also incorporates important but limited aspects of the likelihood concept: the sufficiency concept, expressed in the general refusal to use randomized tests and confidence limits when they are recommended by the Neyman-Pearson approach; and some applications of the conditionality concept. It is remarkable that this concept, an incompletely formalized synthesis of ingredients borrowed from mutually incompatible theoretical approaches, is evidently useful continuously in much critically informed statistical thinking and practice [emphasis mine].

While inferences of many sorts are evident everywhere in scientific work, the existence of precise, general and accurate schemas of scientific inference remains a problem. Mendelian examples like those of Edwards and my 1969 paper seem particularly appropriate as case-study material for clarifying issues and facilitating effective communication among interested statisticians, scientific workers and philosophers and historians of science.

Allan Birnbaum
New York University
Courant Institute of Mathematical Sciences,
251 Mercer Street,
New York, NY 10012

Birnbaum’s confidence concept, sometimes written (Conf), was his attempt to find in error statistical ideas a concept of statistical evidence–a term that he invented and popularized. In Birnbaum 1977 (24), he states it as follows:

(Conf): A concept of statistical evidence is not plausible unless it finds ‘strong evidence for J as against H with small probability (α) when H is true, and with much larger probability (1 – β) when J is true.

Birnbaum questioned whether Neyman-Pearson methods had “concepts of evidence”  simply because Neyman talked of “inductive behavior” and Wald and others cauched statistical methods in decision-theoretic terms. I have been urging that we consider instead how the tools may actually be used, and not be restricted by the statistical philosophies of founders (not to mention that so many of their statements are tied up with personality disputes, and problems of “anger management”). Recall, as well, E. Pearson’s insistence on an evidential construal of N-P methods, and the fact that Neyman, in practice, spoke of drawing inferences and reaching conclusions (e.g., Neyman’s nursery posts, links in [iii] below). Continue reading

Categories: Likelihood Principle, phil/history of stat, Statistics | Tags: | 3 Comments

U-Phil: Mayo’s response to Hennig and Gandenberger

brakes on the 'breakthrough'

brakes on the ‘breakthrough’

“This will be my last post on the (irksome) Birnbaum argument!” she says with her fingers (or perhaps toes) crossed. But really, really it is (at least until midnight 2013). In fact the following brief remarks are all said, more clearly, in my (old) PAPER , new paperMayo 2010Cox & Mayo 2011 (appendix), and in posts connected to this U-Phil: Blogging the likelihood principle, new summary 10/31/12*.

What’s the catch?

In my recent ‘Ton o’ Bricks” post,many readers were struck by the implausibility of letting the evidential interpretation of x’* be influenced by the properties of experiments known not to have produced x’*. Yet it is altogether common to be told that, should a sampling theorist try to block this, “unfortunately there is a catch” (Ghosh, Delampady, and Semanta 2006, 38): We would be forced to embrace the strong likelihood principle (SLP, or LP, for short), at least according to an infamous argument by Allan Birnbaum (who himself rejected the LP [i]).

It is not uncommon to see statistics texts argue that in frequentist theory one is faced with the following dilemma: either to deny the appropriateness of conditioning on the precision of the tool chosen by the toss of a coin, or else to embrace the strong likelihood principle, which entails that frequentist sampling distributions are irrelevant to inference once the data are obtained. This is a false dilemma. . . . The “dilemma” argument is therefore an illusion. (Cox and Mayo 2010, 298)

In my many detailed expositions, I have explained the source of the illusion and sleight of hand from a number of perspectives (I will not repeat references here). While I appreciate the care that Hennig and Gandenberger have taken in their U-Phils (and wish them all the luck in published outgrowths), it is clear to me that they are not hearing (or are unwittingly blocking) the scre-e-e-e-ching of the brakes!

No revolution, no breakthrough!

Berger and Wolpert, in their famous monograph The Likelihood Principle, identify the core issue:

The philosophical incompatibility of the LP and the frequentist viewpoint is clear, since the LP deals only with the observed x, while frequentist analyses involve averages over possible observations. . . . Enough direct conflicts have been . . . seen to justify viewing the LP as revolutionary from a frequentist perspective. (Berger and Wolpert 1988, 65-66)[ii]

If Birnbaum’s proof does not apply to a frequentist sampling theorist, then there is neither a revolution nor a breakthrough (as Savage called it). The SLP holds just for methodologies in which it holds . . . We are going in circles.

Block my counterexamples, please!

Since Birnbaum’s argument has stood for over fifty years, I’ve given it the maximal run for its money, and haven’t tried to block its premises, however questionable its key moves may appear. Despite such latitude, I’ve shown that the “proof” to the SLP conclusion will not wash, and I’m just a wee bit disappointed that Hennig and Gandenberger haven’t wrestled with my specific argument, or shown just where they think my debunking fails. What would this require?

Since the SLP is a universal generalization, it requires only a single counterexample to falsify it. In fact, every violation of the SLP within frequentist sampling theory, I show, is a counterexample to it! In other words, using the language from the definition of the SLP, the onus is on Birnbaum to show that for any x’* that is a member of an SLP pair (E’, E”) with given, different probability models f’, f”, that x’* and x”* should have the identical evidential import for an inference concerning parameter q–, on pain of facing “the catch” above, i.e., being forced to allow the import of data known to have come from E’ to be altered by unperformed experiments known not to have produced x’*.

If one is to release the breaks from my screeching halt, defenders of Birnbaum might try to show that the SLP counterexamples lead me to “the catch” as alleged. I have considered two well-known violations of the SLP. Can it be shown that a contradiction with the WCP or SP follows? I say no. Neither Hennig[ii] nor Gandenberger show otherwise.

In my tracing out of Birnbaum’s arguments, I strived to assume that he would not be giving us circular arguments. To say that “I can prove that your methodology must obey the SLP,” and then to set out to do so by declaring “Hey Presto! Assume sampling distributions are irrelevant (once the data are in hand),” is a neat trick, but it assumes what it purports to prove. All other interpretations are shown to be unsound.


[i] Birnbaum himself, soon after presenting his result, rejected the SLP. As Birnbaum puts it, ”the likelihood concept cannot be construed so as to allow useful appraisal, and thereby possible control, of probabilities of erroneous interpretations.” (Birnbaum 1969, p. 128.)

(We use LP and SLP synonymously here.)

[ii] Hennig initially concurred with me, but says a person convinced him to get back on the Birnbaum bus (even though Birnbaum got off it [i]).

Some other, related, posted discussions: Brakes on Breakthrough Part 1 (12/06/11)  & Part 2 (12/07/11); Don’t Birnbaumize that experiment (12/08/12); Midnight with Birnbaum re-blog (12/31/12). The initial call to this U-Phil, the extension, details here,  the post from my 28 Nov. seminar, (LSE), and the original post by Gandenberger,


Birnbaum, A. (1962), “On the Foundations of Statistical Inference“, Journal of the American Statistical Association 57 (298), 269-306.

Savage, L. J., Barnard, G., Cornfield, J., Bross, I, Box, G., Good, I., Lindley, D., Clunies-Ross, C., Pratt, J., Levene, H., Goldman, T., Dempster, A., Kempthorne, O, and Birnbaum, A. (1962). On the foundations of statistical inference: “Discussion (of Birnbaum 1962)”,  Journal of the American Statistical Association 57 (298), 307-326.

Birbaum, A (1970). Statistical Methods in Scientific Inference  (letter to the editor). Nature 225, 1033.

Cox D. R. and Mayo. D. (2010). “Objectivity and Conditionality in Frequentist Inference” in Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science (D Mayo & A. Spanos eds.), CUP 276-304.

…and if that’s not enough, search this blog.


Categories: Birnbaum Brakes, Likelihood Principle, Statistics | 30 Comments

Coming up: December U-Phil Contributions….

Dear Reader: You were probably* wondering about the December U-Phils (blogging the strong likelihood principle (SLP)). They will be posted, singly or in pairs, over the next few blog entries. Here is the initial call, and the extension. The details of the specific U-Phil may be found here, but also look at the post from my 28 Nov. seminar at the London School of Economics (LSE), which was on the SLP. Posts were to be in relation to either the guest graduate student post by Gandenberger, and/or my discussion/argument and reactions to it. Earlier U-Phils may be found here; and more by searching this blog. “U-Phil” is short for “you ‘philosophize”.

If you have ideas for future “U-Phils,” post them as comments to this blog or send them to

*This is how I see “probability” mainly used in ordinary English, namely as expressing something like “here’s a pure guess made without evidence or with little evidence,” be it sarcastic or quite genuine.


Categories: Announcement, Likelihood Principle, U-Phil | Leave a comment

Don’t Birnbaumize that experiment my friend*–updated reblog

img_0196Our current topic, the strong likelihood principle (SLP), was recently mentioned by blogger Christian Robert (nice diagram). So ,since it’s Saturday night, and given the new law just passed in the state of Washington*, I’m going to reblog a post from Jan. 8, 2012, along with a new UPDATE (following a video we include as an experiment). The new material will be in red (slight differences in notation are explicated within links).

(A)  “It is not uncommon to see statistics texts argue that in frequentist theory one is faced with the following dilemma: either to deny the appropriateness of conditioning on the precision of the tool chosen by the toss of a coin[i], or else to embrace the strong likelihood principle which entails that frequentist sampling distributions are irrelevant to inference once the data are obtained.  This is a false dilemma … The ‘dilemma’ argument is therefore an illusion”. (Cox and Mayo 2010, p. 298)

The “illusion” stems from the sleight of hand I have been explaining in the Birnbaum argument—it starts with Birnbaumization. Continue reading

Categories: Birnbaum Brakes, Likelihood Principle, Statistics | 9 Comments

Announcement: U-Phil Extension: Blogging the Likelihood Principle

U-Phil: I am extending to Dec. 19, 2012 the date for sending me responses to the “U-Phil” call, see initial call, given some requests for more time. The details of the specific U-Phil may be found here, but you might also look at the post relating to my 28 Nov. seminar at the LSE, which is directly on the topic: the infamous (strong) likelihood principle (SLP). “U-Phil, ” which is short for “you ‘philosophize'” is really just an opportunity to write something .5-1 notch above an ordinary comment (focussed on one or more specific posts/papers, as described in each call): it can be longer (~500-1000 words), and it appears in the regular blog area rather than as a comment.  Your remarks can relate to the guest graduate student post by Gregory Gandenberger, and/or my discussion/argument. Graduate student posts (e.g., attendees of my 28 Nov. LSE seminar?) are especially welcome*. Earlier explemplars of U-Phils may be found here; and more by searching this blog.

Thanks to everyone who sent me names of vintage typewriter repair shops in London, after the airline damage: the “x” is fixed, but the “z” key is still misbehaving.

*Another post of possible relevance to graduate students comes up when searching this blog for  “sex”.

Categories: Announcement, Likelihood Principle, U-Phil | Leave a comment

Blogging Birnbaum: on Statistical Methods in Scientific Inference

I said I’d make some comments on Birnbaum’s letter (to Nature), (linked in my last post), which is relevant to today’s Seminar session (at the LSE*), as well as to (Normal Deviate‘s) recent discussion of frequentist inference–in terms of constructing procedures with good long-run “coverage”. (Also to the current U-Phil).

NATURE VOL. 225 MARCH 14, 1970 (1033)


Statistical methods in Scientific Inference

 It is regrettable that Edwards’s interesting article[1], supporting the likelihood and prior likelihood concepts, did not point out the specific criticisms of likelihood (and Bayesian) concepts that seem to dissuade most theoretical and applied statisticians from adopting them. As one whom Edwards particularly credits with having ‘analysed in depth…some attractive properties” of the likelihood concept, I must point out that I am not now among the ‘modern exponents” of the likelihood concept. Further, after suggesting that the notion of prior likelihood was plausible as an extension or analogue of the usual likelihood concept (ref.2, p. 200)[2], I have pursued the matter through further consideration and rejection of both the likelihood concept and various proposed formalizations of prior information and opinion (including prior likelihood).  I regret not having expressed my developing views in any formal publication between 1962 and late 1969 (just after ref. 1 appeared). My present views have now, however, been published in an expository but critical article (ref. 3, see also ref. 4)[3] [4], and so my comments here will be restricted to several specific points that Edwards raised. Continue reading

Categories: Likelihood Principle, Statistics, U-Phil | 5 Comments

Likelihood Links [for 28 Nov. Seminar and Current U-Phil]

old blogspot typewriterDear Reader: We just arrived in London[i][ii]. Jean Miller has put together some materials for Birnbaum LP aficionados in connection with my 28 November seminar. Great to have ready links to some of the early comments and replies by Birnbaum, Durbin, Kalbfleish and others, possibly of interest to those planning contributions to the current “U-Phil“.  I will try to make some remarks on Birnbaum’s 1970 letter to the editor tomorrow.

November 28th reading

Categories: Birnbaum Brakes, Likelihood Principle, U-Phil | Leave a comment

Announcement: 28 November: My Seminar at the LSE (Contemporary PhilStat)

28 November: (10 – 12 noon):
Mayo: “On Birnbaum’s argument for the Likelihood Principle: A 50-year old error and its influence on statistical foundations”
PH500 Seminar, Room: Lak 2.06 (Lakatos building). 
London School of Economics and Political Science (LSE)

Background reading: PAPER

See general announcement here.

Background to the Discussion: Question: How did I get involved in disproving Birnbaum’s result in 2006?

Answer: Appealing to something called the “weak conditionality principle (WCP)” arose in avoiding a classic problem (arising from mixture tests) described by David Cox (1958), as discussed in our joint paper:

Cox D. R. and Mayo. D. (2010). “Objectivity and Conditionality in Frequentist Inference” in Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science (D Mayo & A. Spanos eds.), CUP 276-304. Continue reading

Categories: Announcement, Likelihood Principle, Statistics | 12 Comments

Irony and Bad Faith: Deconstructing Bayesians-reblog

 The recent post by Normal Deviate, and my comments on it, remind me of why/how I got back into the Bayesian-frequentist debates in 2006, as described in my first “deconstruction” (and “U-Phil”) on this blog (Dec 11, 2012):

Some time in 2006 (shortly after my ERROR06 conference), the trickle of irony and sometime flood of family feuds issuing from Bayesian forums drew me back into the Bayesian-frequentist debates.1 2  Suddenly sparks were flying, mostly kept shrouded within Bayesian walls, but nothing can long be kept secret even there. Spontaneous combustion is looming. The true-blue subjectivists were accusing the increasingly popular “objective” and “reference” Bayesians of practicing in bad faith; the new O-Bayesians (and frequentist-Bayesian unificationists) were taking pains to show they were not subjective; and some were calling the new Bayesian kids on the block “pseudo Bayesian.” Then there were the Bayesians somewhere in the middle (or perhaps out in left field) who, though they still use the Bayesian umbrella, were flatly denying the very idea that Bayesian updating fits anything they actually do in statistics.3 Obeisance to Bayesian reasoning remained, but on some kind of a priori philosophical grounds. Doesn’t the methodology used in practice really need a philosophy of its own? I say it does, and I want to provide this. Continue reading

Categories: Likelihood Principle, objective Bayesians, Statistics | Tags: , , , , | 33 Comments

U-Phil: Blogging the Likelihood Principle: New Summary

U-Phil: I would like to open up this post, together with Gandenberger’s (Oct. 30, 2012), to reader U-Phils, from December 6- 19 (< 1000 words) for posting on this blog (please see # at bottom of post).  Where Gandenberger claims, “Birnbaum’s proof is valid and his premises are intuitively compelling,” I have shown that if Birnbaum’s premises are interpreted so as to be true, the argument is invalid.  If construed as formally valid, I argue, the premises contradict each other. Who is right?  Gandenberger doesn’t wrestle with my critique of Birnbaum, but I invite you (and Greg!) to do so. I’m pasting a new summary of my argument below.

 The main premises may be found on pp. 11-14. While these points are fairly straightforward (and do not require technical statistics), they offer an intriguing logical, statistical and linguistic puzzle. The following is an overview of my latest take on the Birnbaum argument. See also “Breaking Through the Breakthrough” posts: Dec. 6 and Dec 7, 2011.

Gandenberger also introduces something called the methodological likelihood principle. A related idea for a U-Phil is to ask: can one mount a sound, non-circular argument for that variant?  And while one is at it, do his methodological variants of sufficiency and conditionality yield plausible principles?

Graduate students and others invited!


New Summary of Mayo Critique of Birnbaum’s Argument for the SLP
Deborah Mayo
See also a (draft) of the full PAPER corresponding to this summary, a later and more satisfactory draft is here. Yet other links to the Strong Likelihood Principle SLP: Mayo 2010; Cox & Mayo 2011 (appendix).

Continue reading

Categories: Likelihood Principle, Statistics, U-Phil | 19 Comments

Guest Post: Greg Gandenberger, “Evidential Meaning and Methods of Inference”

Evidential Meaning and Methods of Inference

Greg Gandenberger
PhD student, History and Philosophy of Science
Master’s student, Statistics
University of Pittsburgh

Bayesian methods conform to the Likelihood Principle, while frequentist methods do not.  Thus, proofs of the Likelihood Principle* such as Birnbaum’s (1962) appear to be threats to frequentist positions.  Deborah Mayo has recently argued that Birnbaum’s proof is no threat to frequentist positions because it is invalid (Ch. 7(III) in Mayo and Spanos 2010).  In my view, Birnbaum’s proof is valid and his premises are intuitively compelling.  Nevertheless, I agree with Professor Mayo that the proof, properly understood, does not imply that frequentist methods should not be used.

There are actually at least two different Likelihood Principles: one, which I call the Evidential Likelihood Principle, says that the evidential meaning of an experimental outcome with respect to a set of hypotheses depends only on its likelihood function for those hypothesis (i.e., the function that maps each of those hypotheses to the probability it assigns to that outcome, defined up to a constant of proportionality); the other, which I call the Methodological Likelihood Principle, says that a statistical method should not be used if it can generate different conclusions from outcomes that have the same likelihood function, without a relevant difference in utilities or prior probabilities. Continue reading

Categories: Likelihood Principle | 17 Comments

Failing to Apply vs Violating the Likelihood Principle

In writing a new chapter on the Strong Likelihood Principle [i] the past few weeks, I noticed a passage in G. Casella and R. Berger (2002) that in turn recalled a puzzling remark noted in my Jan. 3, 2012 post. The post began:

A question arose from a Bayesian acquaintance:

“Although the Birnbaum result is of primary importance for sampling theorists, I’m still interested in it because many Bayesian statisticians think that model checking violates the (strong) likelihood principle (SLP), as if this principle is a fundamental axiom of Bayesian statistics”.

But this is puzzling for two reasons. First, if the LP does not preclude testing for assumptions (and he is right that it does not[ii]), then why not simply explain that rather than appeal to a disproof of something that actually never precluded model testing?   To take the disproof of the LP as grounds to announce: “So there! Now even Bayesians are free to test their models” would seem only to ingrain the original fallacy.

You can read the rest of the original post here.

The remark in G. Casella and R. Berger seems to me equivocal on this point: Continue reading

Categories: Likelihood Principle, Philosophy of Statistics, Statistics | Tags: , , , | 2 Comments

Two New Properties of Mathematical Likelihood

17 February 1890--29 July 1962

Note: I find this to be an intriguing, if perhaps little-known, discussion, long before the conflicts reflected in the three articles (the “triad”) below,  Here Fisher links his tests to the Neyman and Pearson lemma in terms of power.  I invite your deconstructions/comments.

by R.A. Fisher, F.R.S.

Proceedings of the Royal Society, Series A, 144: 285-307 (1934)

      To Thomas Bayes must be given the credit of broaching the problem of using the concepts of mathematical probability in discussing problems of inductive inference, in which we argue from the particular to the general; or, in statistical phraselogy, argue from the sample to the population, from which, ex hypothesi, the sample was drawn.  Bayes put forward, with considerable caution, a method by which such problems could be reduced to the form of problems of probability.  His method of doing this depended essentially on postulating a priori knowledge, not of the particular population of which our observations form a sample, but of an imaginary population of populations from which this population was regarded as having been drawn at random.  Clearly, if we have possession of such a priori knowledge, our problem is not properly an inductive one at all, for the population under discussion is then regarded merely as a particular case of a general type, of which we already possess exact knowledge, and are therefore in a position to draw exact deductive inferences.

Continue reading

Categories: Likelihood Principle, Statistics | Tags: , , , , , | 2 Comments

Part II: Breaking Through the Breakthrough* (please start with Dec 6 post)

This is a first draft of part II of the presentation begun in the December 6 blog post.  This completes the proposed presentation. I expecterrors, and I will be grateful for feedback! (NOTE: I did not need to actually rip a cover of EGEK to obtain this effect!)


You have observed y”, the .05 significant result from E”,the optional stopping rule, ending at n = 100.

Birnbaum claims he can show that you, as a frequentist error statistician, must grant that it is equivalent to having fixed n= 100 at the start (i.e., experiment E’)


The (strong) LikelihoodPrinciple (LP) is a universal conditional claim:

If two data sets y’and y” from experiments E’ and E” respectively, have likelihood functions which are functions of the same parameter(s) µ

and are proportional to each other, then y’ and y”should lead to identical inferential conclusions about µ Continue reading

Categories: Birnbaum Brakes, Likelihood Principle | 2 Comments

Putting the Brakes on the Breakthrough Part I*

brakes on the 'breakthrough'

brakes on the ‘breakthrough’

I am going to post a FIRST draft (for a brief presentation next week in Madrid).  [I thank David Cox for the idea!] I expect errors, and I will be very grateful for feedback!  This is part I; part II will be posted tomorrow.  These posts may disappear once I’ve replaced them with a corrected draft.  I’ll then post the draft someplace.

If you wish to share queries/corrections please post as a comment or e-mail:  (ignore Greek symbols that are not showing correctly, I await fixes by Elbians.) Thanks much!

ONE: A Conversation between Sir David Cox and D. Mayo (June, 2011)

Toward the end of this exchange, the issue of the Likelihood Principle (LP)[1] arose:

COX: It is sometimes claimed that there are logical inconsistencies in frequentist theory, in particular surrounding the strong Likelihood Principle (LP). I know you have written about this, what is your view at the moment.

MAYO: What contradiction?
COX: Well, that frequentist theory does not obey the strong LP. Continue reading

Categories: Birnbaum Brakes, Likelihood Principle | Tags: , | 5 Comments

ReBlogging the Likelihood Principle #2: Solitary Fishing:SLP Violations

Reblogging from a year ago. The Appendix of the “Cox/Mayo Conversation” (linked below [i]) is an attempt to quickly sketch Birnbaum’s argument for the strong likelihood principle (SLP), and its sins.  Couple of notes: Firstly, I am a philosopher (of science and statistics) not a statistician.  That means, my treatment will show all of the typical (and perhaps annoying) signs of being a trained philosopher-logician.  I’ve no doubt statisticians would want to use different language, which is welcome.  Second, this is just a blog (although perhaps my published version is still too informal for some). Continue reading

Categories: Likelihood Principle | Tags: , , | 9 Comments

Blog at The Adventure Journal Theme.


Get every new post delivered to your Inbox.

Join 410 other followers