# Likelihood Principle

## “Intentions” is the new code word for “error probabilities”: Allan Birnbaum’s Birthday

27 May 1923-1 July 1976

Today is Allan Birnbaum’s Birthday. Birnbaum’s (1962) classic “On the Foundations of Statistical Inference,” in Breakthroughs in Statistics (volume I 1993), concerns a principle that remains at the heart of today’s controversies in statistics–even if it isn’t obvious at first: the Likelihood Principle (LP) (also called the strong likelihood Principle SLP, to distinguish it from the weak LP [1]). According to the LP/SLP, given the statistical model, the information from the data are fully contained in the likelihood ratio. Thus, properties of the sampling distribution of the test statistic vanish (as I put it in my slides from my last post)! But error probabilities are all properties of the sampling distribution. Thus, embracing the LP (SLP) blocks our error statistician’s direct ways of taking into account “biasing selection effects” (slide #10).

Intentions is a New Code Word: Where, then, is all the information regarding your trying and trying again, stopping when the data look good, cherry picking, barn hunting and data dredging? For likelihoodists and other probabilists who hold the LP/SLP, it is ephemeral information locked in your head reflecting your “intentions”!  “Intentions” is a code word for “error probabilities” in foundational discussions, as in “who would want to take intentions into account?” (Replace “intentions” (or the “researcher’s intentions”) with “error probabilities” (or the method’s error probabilities”) and you get a more accurate picture.) Keep this deciphering tool firmly in mind as you read criticisms of methods that take error probabilities into account[2]. For error statisticians, this information reflects real and crucial properties of your inference procedure.

## Statistical Science: The Likelihood Principle issue is out…!

Mayo, Deborah G. On the Birnbaum Argument for the Strong Likelihood Principle (with discussion & rejoinder). Statistical Science 29 (2014), no. 2, 227-266.

Mayo, Deborah G. On the Birnbaum Argument for the Strong Likelihood Principle. Statistical Science 29 (2014), no. 2, 227-239.

Dawid, A. P. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 240-241.

Evans, Michael. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 242-246.

Martin, Ryan; Liu, Chuanhai. Discussion: Foundations of Statistical Inference, Revisited. Statistical Science 29 (2014), no. 2, 247-251.

Fraser, D. A. S. Discussion: On Arguments Concerning Statistical Principles. Statistical Science 29 (2014), no. 2, 252-253.

Hannig, Jan. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 254-258.

Bjørnstad, Jan F. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 259-260.

Mayo, Deborah G. Rejoinder: “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 261-266.

Abstract: An essential component of inference based on familiar frequentist notions, such as p-values, significance and confidence levels, is the relevant sampling distribution. This feature results in violations of a principle known as the strong likelihood principle (SLP), the focus of this paper. In particular, if outcomes x and y from experiments E1 and E2 (both with unknown parameter θ), have different probability models f1( . ), f2( . ), then even though f1(xθ) = cf2(yθ) for all θ, outcomes x and ymay have different implications for an inference about θ. Although such violations stem from considering outcomes other than the one observed, we argue, this does not require us to consider experiments other than the one performed to produce the data. David Cox [Ann. Math. Statist. 29 (1958) 357–372] proposes the Weak Conditionality Principle (WCP) to justify restricting the space of relevant repetitions. The WCP says that once it is known which Ei produced the measurement, the assessment should be in terms of the properties of Ei. The surprising upshot of Allan Birnbaum’s [J.Amer.Statist.Assoc.57(1962) 269–306] argument is that the SLP appears to follow from applying the WCP in the case of mixtures, and so uncontroversial a principle as sufficiency (SP). But this would preclude the use of sampling distributions. The goal of this article is to provide a new clarification and critique of Birnbaum’s argument. Although his argument purports that [(WCP and SP), entails SLP], we show how data may violate the SLP while holding both the WCP and SP. Such cases also refute [WCP entails SLP].

Key words: Birnbaumization, likelihood principle (weak and strong), sampling theory, sufficiency, weak conditionality

Regular readers of this blog know that the topic of the “Strong Likelihood Principle (SLP)” has come up quite frequently. Numerous informal discussions of earlier attempts to clarify where Birnbaum’s argument for the SLP goes wrong may be found on this blog. [SEE PARTIAL LIST BELOW.[i]] These mostly stem from my initial paper Mayo (2010) [ii]. I’m grateful for the feedback.

In the months since this paper has been accepted for publication, I’ve been asked, from time to time, to reflect informally on the overall journey: (1) Why was/is the Birnbaum argument so convincing for so long? (Are there points being overlooked, even now?) (2) What would Birnbaum have thought? (3) What is the likely upshot for the future of statistical foundations (if any)?

I’ll try to share some responses over the next week. (Naturally, additional questions are welcome.)

[i] A quick take on the argument may be found in the appendix to: “A Statistical Scientist Meets a Philosopher of Science: A conversation between David Cox and Deborah Mayo (as recorded, June 2011)”

UPhils and responses

## BREAKING THE LAW! (of likelihood): to keep their fit measures in line (A), (B 2nd)

.

1.An Assumed Law of Statistical Evidence (law of likelihood)

Nearly all critical discussions of frequentist error statistical inference (significance tests, confidence intervals, p- values, power, etc.) start with the following general assumption about the nature of inductive evidence or support:

Data x are better evidence for hypothesis H1 than for H0 if x are more probable under H1 than under H0.

Ian Hacking (1965) called this the logic of support: x supports hypotheses H1 more than H0 if H1 is more likely, given x than is H0:

Pr(x; H1) > Pr(x; H0).

[With likelihoods, the data x are fixed, the hypotheses vary.]*

Or,

x is evidence for H1 over H0 if the likelihood ratio LR (H1 over H0 ) is greater than 1.

It is given in other ways besides, but it’s the same general idea. (Some will take the LR as actually quantifying the support, others leave it qualitative.)

In terms of rejection:

“An hypothesis should be rejected if and only if there is some rival hypothesis much better supported [i.e., much more likely] than it is.” (Hacking 1965, 89)

2. Barnard (British Journal of Philosophy of Science )

But this “law” will immediately be seen to fail on our minimal severity requirement. Hunting for an impressive fit, or trying and trying again, it’s easy to find a rival hypothesis H1 much better “supported” than H0 even when H0 is true. Or, as Barnard (1972) puts it, “there always is such a rival hypothesis, viz. that things just had to turn out the way they actually did” (1972 p. 129).  H0: the coin is fair, gets a small likelihood (.5)k given k tosses of a coin, while H1: the probability of heads is 1 just on those tosses that yield a head, renders the sequence of k outcomes maximally likely. This is an example of Barnard’s “things just had to turn out as they did”. Or, to use an example with P-values: a statistically significant difference, being improbable under the null H0 , will afford high likelihood to any number of explanations that fit the data well.

3.Breaking the law (of likelihood) by going to the “second,” error statistical level:

How does it fail our severity requirement? First look at what the frequentist error statistician must always do to critique an inference: she must consider the capability of the inference method that purports to provide evidence for a claim. She goes to a higher level or metalevel, as it were. In this case, the likelihood ratio plays the role of the needed statistic d(X). To put it informally, she asks:

What’s the probability the method would yield an LR disfavoring H0 compared to some alternative H1  even if H0 is true?

## Allan Birnbaum, Philosophical Error Statistician: 27 May 1923 – 1 July 1976

27 May 1923-   1 July 1976

Today is Allan Birnbaum’s Birthday. Birnbaum’s (1962) classic “On the Foundations of Statistical Inference” is in Breakthroughs in Statistics (volume I 1993).  I’ve a hunch that Birnbaum would have liked my rejoinder to discussants of my forthcoming paper (Statistical Science): Bjornstad, Dawid, Evans, Fraser, Hannig, and Martin and Liu. I hadn’t realized until recently that all of this is up under “future papers” here [1]. You can find the rejoinder: STS1404-004RA0-2. That takes away some of the surprise of having it all come out at once (and in final form). For those unfamiliar with the argument, at the end of this entry are slides from a recent, entirely informal, talk that I never posted, as well as some links from this blog. Happy Birthday Birnbaum! Continue reading

## Barnard’s Birthday: background, likelihood principle, intentions

G.A. Barnard: 23 Sept.1915 – 9 Aug.2002

Reblog (year ago) : G.A. Barnard’s birthday is today, so here’s a snippet of his discussion with Savage (1962) (link below [i]) that connects to some earlier issues: stopping rules, likelihood principle, and background information here and here (at least of one type). (A few other Barnard links on this blog are below* .) Happy Birthday George!

Barnard: I have been made to think further about this issue of the stopping rule since I first suggested that the stopping rule was irrelevant (Barnard 1947a,b). This conclusion does not follow only from the subjective theory of probability; it seems to me that the stopping rule is irrelevant in certain circumstances.  Since 1947 I have had the great benefit of a long correspondence—not many letters because they were not very frequent, but it went on over a long time—with Professor Bartlett, as a result of which I am considerably clearer than I was before. My feeling is that, as I indicated [on p. 42], we meet with two sorts of situation in applying statistics to data One is where we want to have a single hypothesis with which to confront the data. Do they agree with this hypothesis or do they not? Now in that situation you cannot apply Bayes’s theorem because you have not got any alternatives to think about and specify—not yet. I do not say they are not specifiable—they are not specified yet. And in that situation it seems to me the stopping rule is relevant.

In particular, suppose somebody sets out to demonstrate the existence of extrasensory perception and says ‘I am going to go on until I get a one in ten thousand significance level’. Knowing that this is what he is setting out to do would lead you to adopt a different test criterion. What you would look at would not be the ratio of successes obtained, but how long it took him to obtain it. And you would have a very simple test of significance which said if it took you so long to achieve this increase in the score above the chance fraction, this is not at all strong evidence for E.S.P., it is very weak evidence. And the reversing of the choice of test criteria would I think overcome the difficulty.

This is the answer to the point Professor Savage makes; he says why use one method when you have vague knowledge, when you would use a quite different method when you have precise knowledge. It seem to me the answer is that you would use one method when you have precisely determined alternatives, with which you want to compare a given hypothesis, and you use another method when you do not have these alternatives.

Savage: May I digress to say publicly that I learned the stopping-rule principle from professor Barnard, in conversation in the summer of 1952. Frankly I then thought it a scandal that anyone in the profession could advance an idea so patently wrong, even as today I can scarcely believe that some people resist an idea so patently right. I am particularly surprised to hear Professor Barnard say today that the stopping rule is irrelevant in certain circumstances only, for the argument he first gave in favour of the principle seems quite unaffected by the distinctions just discussed. The argument then was this: The design of a sequential experiment is, in the last analysis, what the experimenter actually intended to do. His intention is locked up inside his head and cannot be known to those who have to judge the experiment. Never having been comfortable with that argument, I am not advancing it myself. But if Professor Barnard still accepts it, how can he conclude that the stopping-rule principle is only sometimes valid? (emphasis added) Continue reading

## Gandenberger: How to Do Philosophy That Matters (guest post)

Greg Gandenberger
Philosopher of Science
University of Pittsburgh
gandenberger.org

Genuine philosophical problems are always rooted in urgent problems outside philosophy,
and they die if these roots decay
Karl Popper (1963, 72)

My concern in this post is how we philosophers can use our skills to do work that matters to people both inside and outside of philosophy.

Philosophers are highly skilled at conceptual analysis, in which one takes an interesting but unclear concept and attempts to state precisely when it applies and when it doesn’t.

What is the point of this activity? In many cases, this question has no satisfactory answer. Conceptual analysis becomes an end in itself, and philosophical debates become fruitless arguments about words. The pleasure we philosophers take in such arguments hardly warrants scarce government and university resources. It does provide good training in critical thinking, but so do many other activities that are also immediately useful, such as doing science and programming computers.

Conceptual analysis does not have to be pointless. It is often prompted by a real-world problem. In Plato’s Euthyphro, for instance, the character Euthyphro thought that piety required him to prosecute his father for murder. His family thought on the contrary that for a son to prosecute his own father was the height of impiety. In this situation, the question “what is piety?” took on great urgency. It also had great urgency for Socrates, who was awaiting trial for corrupting the youth of Athens.

In general, conceptual analysis often begins as a response to some question about how we ought to regulate our beliefs or actions. It can be a fruitful activity as long as the questions that prompted it are kept in view. It tends to degenerate into merely verbal disputes when it becomes an end in itself.

The kind of goal-oriented view of conceptual analysis I aim to articulate and promote is not teleosemantics: it is a view about how philosophy should be done rather than a theory of meaning. It is consistent with Carnap’s notion of explication (one of the desiderata of which is fruitfulness) (Carnap 1963, 5), but in practice Carnapian explication seems to devolve into idle word games just as easily as conceptual analysis. Our overriding goal should not be fidelity to intuitions, precision, or systematicity, but usefulness.

How I Became Suspicious of Conceptual Analysis

When I began working on proofs of the Likelihood Principle, I assumed that following my intuitions about the concept of “evidential equivalence” would lead to insights about how science should be done. Birnbaum’s proof showed me that my intuitions entail the Likelihood Principle, which frequentist methods violate. Voila! Voila! Scientists shouldn’t use frequentist methods. All that remained to be done was to fortify Birnbaum’s proof, as I do in “A New Proof of the Likelihood Principle” by defending it against objections and buttressing it with an alternative proof. [Editor: For a number of related materials on this blog see Mayo’s JSM presentation, and note [i].]

After working on this topic for some time, I realized that I was making simplistic assumptions about the relationship between conceptual intuitions and methodological norms. At most, a proof of the Likelihood Principle can show you that frequentist methods run contrary to your intuitions about evidential equivalence. Even if those intuitions are true, it does not follow immediately that scientists should not use frequentist methods. The ultimate aim of science, presumably, is not to respect evidential equivalence but (roughly) to learn about the world and make it better. The demand that scientists use methods that respect evidential equivalence is warranted only insofar as it is conducive to achieving those ends. Birnbaum’s proof says nothing about that issue.

• In general, a conceptual analysis–even of a normatively freighted term like “evidence”–is never enough by itself to justify a normative claim. The questions that ultimately matter are not about “what we mean” when we use particular words and phrases, but rather about what our aims are and how we can best achieve them.

How to Do Conceptual Analysis Teleologically

This is not to say that my work on the Likelihood Principle or conceptual analysis in general is without value. But it is nothing more than a kind of careful lexicography. This kind of work is potentially useful for clarifying normative claims with the aim of assessing and possibly implementing them. To do work that matters, philosophers engaged in conceptual analysis need to take enough interest in the assessment and implementation stages to do their conceptual analysis with the relevant normative claims in mind.

So what does this kind of teleological (goal-oriented) conceptual analysis look like?

It can involve personally following through on the process of assessing and implementing the relevant norms. For example, philosophers at Carnegie Mellon University working on causation have not only provided a kind of analysis of the concept of causation but also developed algorithms for causal discovery, proved theorems about those algorithms, and applied those algorithms to contemporary scientific problems (see e.g. Spirtes et al. 2000).

I have great respect for this work. But doing conceptual analysis does not have to mean going so far outside the traditional bounds of philosophy. A perfect example is James Woodward’s related work on causal explanation, which he describes as follows (2003, 7-8, original emphasis):

My project…makes recommendations about what one ought to mean by various causal and explanatory claims, rather than just attempting to describe how we use those claims. It recognizes that causal and explanatory claims sometimes are confused, unclear, and ambiguous and suggests how those limitations might be addressed…. we introduce concepts…and characterize them in certain ways…because we want to do things with them…. Concepts can be well or badly designed for such purposes, and we can evaluate them accordingly.

Woodward keeps his eye on what the notion of causation is for, namely distinguishing between relationships that do and relationships that do not remain invariant under interventions. This distinction is enormously important because only relationships that remain invariant under interventions provide “handles” we can use to change the world.

Here are some lessons about teleological conceptual analysis that we can take from Woodward’s work. (I’m sure this list could be expanded.)

1. Teleological conceptual analysis puts us in charge. In his wonderful presidential address at the 2012 meeting of the Philosophy of Science Association, Woodward ended a litany of metaphysical arguments against regarding mental events as causes by asking “Who’s in charge here?” There is no ideal form of Causation to which we must answer. We are free to decide to use “causation” and related words in the ways that best serve our interests.
2. Teleological conceptual analysis can be revisionary. If ordinary usage is not optimal, we can change it.
3. The product of a teleological conceptual analysis need not be unique. Some philosophers reject Woodward’s account because they regard causation as a process rather than as a relationship among variables. But why do we need to choose? There could just be two different notions of causation. Woodward’s account captures one notion that is very important in science and everyday life. If it captures all of the causal notions that are important, then so much the better. But this kind of comprehensiveness is not essential.
4. Teleological conceptual analysis can be non-reductive. Woodward characterizes causal relations as (roughly) correlation relations that are invariant under certain kinds of interventions. But the notion of an intervention is itself causal. Woodward’s account is not circular because it characterizes what it means for a causal relationship to hold between two variables in terms of a different causal processes involving different sets of variables. But it is non-reductive in the sense that does not allow us to replace causal claims with equivalent non-causal claims (as, e.g., counterfactual, regularity, probabilistic, and process theories purport to do). This fact is a problem if one’s primary concern is to reduce one’s ultimate metaphysical commitments, but it is not necessarily a problem if one’s primary concern is to improve our ability to assess and use causal claims.

Conclusion

Philosophers rarely succeed in capturing all of our intuitions about an important informal concept. Even if they did succeed, they would have more work to do in justifying any norms that invoke that concept. Conceptual analysis can be a first step toward doing philosophy that matters, but it needs to be undertaken with the relevant normative claims in mind.

Question: What are your best examples of philosophy that matters? What can we learn from them?

Citations

• Birnbaum, Allan. “On the Foundations of Statistical Inference.” Journal of the American Statistical Association 57.298 (1962): 269-306.
• Carnap, Rudolf. Logical Foundations of Probability. U of Chicago Press, 1963.
• Gandenberger, Greg. “A New Proof of the Likelihood Principle.” The British Journal for the Philosophy of Science (forthcoming).
• Plato. Euthyphrohttp://classics.mit.edu/Plato/euthyfro.html.
• Popper, Karl. Conjectures and Refutations. London: Routledge & Kegan Paul, 1963.
• Spirtes, Peter, Clark Glymour, and Richard Scheines. Causation, Prediction, and Search. Vol. 81. The MIT Press, 2000.
• Woodward, James. Making Things Happen: A Theory of Causal Explanation. Oxford University Press, 2003.

[i] Earlier posts are here and here. Some U-Phils are here, here, and here. For some amusing notes (e.g., Don’t Birnbaumize that experiment my friend, and Midnight with Birnbaum).

Some related papers:

• Cox D. R. and Mayo. D. G. (2010). “Objectivity and Conditionality in Frequentist Inference” in Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science (D Mayo and A. Spanos eds.), Cambridge: Cambridge University Press: 276-304.

## A.Birnbaum: Statistical Methods in Scientific Inference

Birnbaum: born May 27, 1923

Today is (statistician) Allan Birnbaum’s birthday. He lived to be only 53 [i]. From the perspective of philosophy of statistics and philosophy of science, Birnbaum is best known for his work on likelihood, the Likelihood Principle [ii], and for his attempts to blend concepts of likelihood with error probability ideas to obtain what he called “concepts of statistical evidence”. Failing to find adequate concepts of statistical evidence, Birnbaum called for joining the work of “interested statisticians, scientific workers and philosophers and historians of science”–an idea I would heartily endorse!  While known for attempts to argue that the (strong) Likelihood Principle followed from sufficiency and conditionality principles, a few years after publishing this result, he seems to have turned away from it, perhaps discovering gaps in his argument.

NATURE VOL. 225 MARCH 14, 1970 (1033)

LETTERS TO THE EDITOR

Statistical methods in Scientific Inference

It is regrettable that Edwards’s interesting article[1], supporting the likelihood and prior likelihood concepts, did not point out the specific criticisms of likelihood (and Bayesian) concepts that seem to dissuade most theoretical and applied statisticians from adopting them. As one whom Edwards particularly credits with having ‘analysed in depth…some attractive properties” of the likelihood concept, I must point out that I am not now among the ‘modern exponents” of the likelihood concept. Further, after suggesting that the notion of prior likelihood was plausible as an extension or analogue of the usual likelihood concept (ref.2, p. 200)[2], I have pursued the matter through further consideration and rejection of both the likelihood concept and various proposed formalizations of prior information and opinion (including prior likelihood).  I regret not having expressed my developing views in any formal publication between 1962 and late 1969 (just after ref. 1 appeared). My present views have now, however, been published in an expository but critical article (ref. 3, see also ref. 4)[3] [4], and so my comments here will be restricted to several specific points that Edwards raised.

If there has been ‘one rock in a shifting scene’ or general statistical thinking and practice in recent decades, it has not been the likelihood concept, as Edwards suggests, but rather the concept by which confidence limits and hypothesis tests are usually interpreted, which we may call the confidence concept of statistical evidence. This concept is not part of the Neyman-Pearson theory of tests and confidence region estimation, which denies any role to concepts of statistical evidence, as Neyman consistently insists. The confidence concept takes from the Neyman-Pearson approach techniques for systematically appraising and bounding the probabilities (under respective hypotheses) of seriously misleading interpretations of data. (The absence of a comparable property in the likelihood and Bayesian approaches is widely regarded as a decisive inadequacy.) The confidence concept also incorporates important but limited aspects of the likelihood concept: the sufficiency concept, expressed in the general refusal to use randomized tests and confidence limits when they are recommended by the Neyman-Pearson approach; and some applications of the conditionality concept. It is remarkable that this concept, an incompletely formalized synthesis of ingredients borrowed from mutually incompatible theoretical approaches, is evidently useful continuously in much critically informed statistical thinking and practice [emphasis mine].

While inferences of many sorts are evident everywhere in scientific work, the existence of precise, general and accurate schemas of scientific inference remains a problem. Mendelian examples like those of Edwards and my 1969 paper seem particularly appropriate as case-study material for clarifying issues and facilitating effective communication among interested statisticians, scientific workers and philosophers and historians of science.

Allan Birnbaum
New York University
Courant Institute of Mathematical Sciences,
251 Mercer Street,
New York, NY 10012

Birnbaum’s confidence concept, sometimes written (Conf), was his attempt to find in error statistical ideas a concept of statistical evidence–a term that he invented and popularized. In Birnbaum 1977 (24), he states it as follows:

(Conf): A concept of statistical evidence is not plausible unless it finds ‘strong evidence for J as against H with small probability (α) when H is true, and with much larger probability (1 – β) when J is true.

Birnbaum questioned whether Neyman-Pearson methods had “concepts of evidence”  simply because Neyman talked of “inductive behavior” and Wald and others cauched statistical methods in decision-theoretic terms. I have been urging that we consider instead how the tools may actually be used, and not be restricted by the statistical philosophies of founders (not to mention that so many of their statements are tied up with personality disputes, and problems of “anger management”). Recall, as well, E. Pearson’s insistence on an evidential construal of N-P methods, and the fact that Neyman, in practice, spoke of drawing inferences and reaching conclusions (e.g., Neyman’s nursery posts, links in [iii] below). Continue reading

Categories: Likelihood Principle, phil/history of stat, Statistics | Tags: | 3 Comments

## U-Phil: Mayo’s response to Hennig and Gandenberger

brakes on the ‘breakthrough’

“This will be my last post on the (irksome) Birnbaum argument!” she says with her fingers (or perhaps toes) crossed. But really, really it is (at least until midnight 2013). In fact the following brief remarks are all said, more clearly, in my (old) PAPER , new paperMayo 2010Cox & Mayo 2011 (appendix), and in posts connected to this U-Phil: Blogging the likelihood principle, new summary 10/31/12*.

What’s the catch?

In my recent ‘Ton o’ Bricks” post,many readers were struck by the implausibility of letting the evidential interpretation of x’* be influenced by the properties of experiments known not to have produced x’*. Yet it is altogether common to be told that, should a sampling theorist try to block this, “unfortunately there is a catch” (Ghosh, Delampady, and Semanta 2006, 38): We would be forced to embrace the strong likelihood principle (SLP, or LP, for short), at least according to an infamous argument by Allan Birnbaum (who himself rejected the LP [i]).

It is not uncommon to see statistics texts argue that in frequentist theory one is faced with the following dilemma: either to deny the appropriateness of conditioning on the precision of the tool chosen by the toss of a coin, or else to embrace the strong likelihood principle, which entails that frequentist sampling distributions are irrelevant to inference once the data are obtained. This is a false dilemma. . . . The “dilemma” argument is therefore an illusion. (Cox and Mayo 2010, 298)

In my many detailed expositions, I have explained the source of the illusion and sleight of hand from a number of perspectives (I will not repeat references here). While I appreciate the care that Hennig and Gandenberger have taken in their U-Phils (and wish them all the luck in published outgrowths), it is clear to me that they are not hearing (or are unwittingly blocking) the scre-e-e-e-ching of the brakes!

No revolution, no breakthrough!

Berger and Wolpert, in their famous monograph The Likelihood Principle, identify the core issue:

The philosophical incompatibility of the LP and the frequentist viewpoint is clear, since the LP deals only with the observed x, while frequentist analyses involve averages over possible observations. . . . Enough direct conflicts have been . . . seen to justify viewing the LP as revolutionary from a frequentist perspective. (Berger and Wolpert 1988, 65-66)[ii]

If Birnbaum’s proof does not apply to a frequentist sampling theorist, then there is neither a revolution nor a breakthrough (as Savage called it). The SLP holds just for methodologies in which it holds . . . We are going in circles.

Since Birnbaum’s argument has stood for over fifty years, I’ve given it the maximal run for its money, and haven’t tried to block its premises, however questionable its key moves may appear. Despite such latitude, I’ve shown that the “proof” to the SLP conclusion will not wash, and I’m just a wee bit disappointed that Hennig and Gandenberger haven’t wrestled with my specific argument, or shown just where they think my debunking fails. What would this require?

Since the SLP is a universal generalization, it requires only a single counterexample to falsify it. In fact, every violation of the SLP within frequentist sampling theory, I show, is a counterexample to it! In other words, using the language from the definition of the SLP, the onus is on Birnbaum to show that for any x’* that is a member of an SLP pair (E’, E”) with given, different probability models f’, f”, that x’* and x”* should have the identical evidential import for an inference concerning parameter q–, on pain of facing “the catch” above, i.e., being forced to allow the import of data known to have come from E’ to be altered by unperformed experiments known not to have produced x’*.

If one is to release the breaks from my screeching halt, defenders of Birnbaum might try to show that the SLP counterexamples lead me to “the catch” as alleged. I have considered two well-known violations of the SLP. Can it be shown that a contradiction with the WCP or SP follows? I say no. Neither Hennig[ii] nor Gandenberger show otherwise.

In my tracing out of Birnbaum’s arguments, I strived to assume that he would not be giving us circular arguments. To say that “I can prove that your methodology must obey the SLP,” and then to set out to do so by declaring “Hey Presto! Assume sampling distributions are irrelevant (once the data are in hand),” is a neat trick, but it assumes what it purports to prove. All other interpretations are shown to be unsound.

______

[i] Birnbaum himself, soon after presenting his result, rejected the SLP. As Birnbaum puts it, ”the likelihood concept cannot be construed so as to allow useful appraisal, and thereby possible control, of probabilities of erroneous interpretations.” (Birnbaum 1969, p. 128.)

(We use LP and SLP synonymously here.)

[ii] Hennig initially concurred with me, but says a person convinced him to get back on the Birnbaum bus (even though Birnbaum got off it [i]).

Some other, related, posted discussions: Brakes on Breakthrough Part 1 (12/06/11)  & Part 2 (12/07/11); Don’t Birnbaumize that experiment (12/08/12); Midnight with Birnbaum re-blog (12/31/12). The initial call to this U-Phil, the extension, details here,  the post from my 28 Nov. seminar, (LSE), and the original post by Gandenberger,

OTHER :

Birnbaum, A. (1962), “On the Foundations of Statistical Inference“, Journal of the American Statistical Association 57 (298), 269-306.

Savage, L. J., Barnard, G., Cornfield, J., Bross, I, Box, G., Good, I., Lindley, D., Clunies-Ross, C., Pratt, J., Levene, H., Goldman, T., Dempster, A., Kempthorne, O, and Birnbaum, A. (1962). On the foundations of statistical inference: “Discussion (of Birnbaum 1962)”,  Journal of the American Statistical Association 57 (298), 307-326.

Birbaum, A (1970). Statistical Methods in Scientific Inference  (letter to the editor). Nature 225, 1033.

Cox D. R. and Mayo. D. (2010). “Objectivity and Conditionality in Frequentist Inference” in Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science (D Mayo & A. Spanos eds.), CUP 276-304.

…and if that’s not enough, search this blog.

Categories: Birnbaum Brakes, Likelihood Principle, Statistics | 30 Comments

## Coming up: December U-Phil Contributions….

Dear Reader: You were probably* wondering about the December U-Phils (blogging the strong likelihood principle (SLP)). They will be posted, singly or in pairs, over the next few blog entries. Here is the initial call, and the extension. The details of the specific U-Phil may be found here, but also look at the post from my 28 Nov. seminar at the London School of Economics (LSE), which was on the SLP. Posts were to be in relation to either the guest graduate student post by Gandenberger, and/or my discussion/argument and reactions to it. Earlier U-Phils may be found here; and more by searching this blog. “U-Phil” is short for “you ‘philosophize”.

If you have ideas for future “U-Phils,” post them as comments to this blog or send them to error@vt.edu.

*This is how I see “probability” mainly used in ordinary English, namely as expressing something like “here’s a pure guess made without evidence or with little evidence,” be it sarcastic or quite genuine.

## Don’t Birnbaumize that experiment my friend*–updated reblog

Our current topic, the strong likelihood principle (SLP), was recently mentioned by blogger Christian Robert (nice diagram). So ,since it’s Saturday night, and given the new law just passed in the state of Washington*, I’m going to reblog a post from along with a new UPDATE (following a video we include as an experiment). The new material will be in red (slight differences in notation are explicated within links).

(A)  “It is not uncommon to see statistics texts argue that in frequentist theory one is faced with the following dilemma: either to deny the appropriateness of conditioning on the precision of the tool chosen by the toss of a coin[i], or else to embrace the strong likelihood principle which entails that frequentist sampling distributions are irrelevant to inference once the data are obtained.  This is a false dilemma … The ‘dilemma’ argument is therefore an illusion”. ( p. 298)

The “illusion” stems from the sleight of hand I have been explaining in the Birnbaum argument—it starts with Birnbaumization. Continue reading

Categories: Birnbaum Brakes, Likelihood Principle, Statistics | 9 Comments

## Announcement: U-Phil Extension: Blogging the Likelihood Principle

U-Phil: I am extending to Dec. 19, 2012 the date for sending me responses to the “U-Phil” call, see initial call, given some requests for more time. The details of the specific U-Phil may be found here, but you might also look at the post relating to my 28 Nov. seminar at the LSE, which is directly on the topic: the infamous (strong) likelihood principle (SLP). “U-Phil, ” which is short for “you ‘philosophize'” is really just an opportunity to write something .5-1 notch above an ordinary comment (focussed on one or more specific posts/papers, as described in each call): it can be longer (~500-1000 words), and it appears in the regular blog area rather than as a comment.  Your remarks can relate to the guest graduate student post by Gregory Gandenberger, and/or my discussion/argument. Graduate student posts (e.g., attendees of my 28 Nov. LSE seminar?) are especially welcome*. Earlier explemplars of U-Phils may be found here; and more by searching this blog.

Thanks to everyone who sent me names of vintage typewriter repair shops in London, after the airline damage: the “x” is fixed, but the “z” key is still misbehaving.

*Another post of possible relevance to graduate students comes up when searching this blog for  “sex”.

## Blogging Birnbaum: on Statistical Methods in Scientific Inference

I said I’d make some comments on Birnbaum’s letter (to Nature), (linked in my last post), which is relevant to today’s Seminar session (at the LSE*), as well as to (Normal Deviate‘s) recent discussion of frequentist inference–in terms of constructing procedures with good long-run “coverage”. (Also to the current U-Phil).

NATURE VOL. 225 MARCH 14, 1970 (1033)

LETTERS TO THE EDITOR

Statistical methods in Scientific Inference

It is regrettable that Edwards’s interesting article[1], supporting the likelihood and prior likelihood concepts, did not point out the specific criticisms of likelihood (and Bayesian) concepts that seem to dissuade most theoretical and applied statisticians from adopting them. As one whom Edwards particularly credits with having ‘analysed in depth…some attractive properties” of the likelihood concept, I must point out that I am not now among the ‘modern exponents” of the likelihood concept. Further, after suggesting that the notion of prior likelihood was plausible as an extension or analogue of the usual likelihood concept (ref.2, p. 200)[2], I have pursued the matter through further consideration and rejection of both the likelihood concept and various proposed formalizations of prior information and opinion (including prior likelihood).  I regret not having expressed my developing views in any formal publication between 1962 and late 1969 (just after ref. 1 appeared). My present views have now, however, been published in an expository but critical article (ref. 3, see also ref. 4)[3] [4], and so my comments here will be restricted to several specific points that Edwards raised. Continue reading

Categories: Likelihood Principle, Statistics, U-Phil | 5 Comments

## Likelihood Links [for 28 Nov. Seminar and Current U-Phil]

Dear Reader: We just arrived in London[i][ii]. Jean Miller has put together some materials for Birnbaum LP aficionados in connection with my 28 November seminar. Great to have ready links to some of the early comments and replies by Birnbaum, Durbin, Kalbfleish and others, possibly of interest to those planning contributions to the current “U-Phil“.  I will try to make some remarks on Birnbaum’s 1970 letter to the editor tomorrow.

## Announcement: 28 November: My Seminar at the LSE (Contemporary PhilStat)

28 November: (10 – 12 noon):
Mayo: “On Birnbaum’s argument for the Likelihood Principle: A 50-year old error and its influence on statistical foundations”
PH500 Seminar, Room: Lak 2.06 (Lakatos building).
London School of Economics and Political Science (LSE)

See general announcement here.

Background to the Discussion: Question: How did I get involved in disproving Birnbaum’s result in 2006?

Answer: Appealing to something called the “weak conditionality principle (WCP)” arose in avoiding a classic problem (arising from mixture tests) described by David Cox (1958), as discussed in our joint paper:

Cox D. R. and Mayo. D. (2010). “Objectivity and Conditionality in Frequentist Inference” in Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science (D Mayo & A. Spanos eds.), CUP 276-304. Continue reading

Categories: Announcement, Likelihood Principle, Statistics | 12 Comments

## Irony and Bad Faith: Deconstructing Bayesians-reblog

The recent post by Normal Deviate, and my comments on it, remind me of why/how I got back into the Bayesian-frequentist debates in 2006, as described in my first “deconstruction” (and “U-Phil”) on this blog (Dec 11, 2012):

Some time in 2006 (shortly after my ERROR06 conference), the trickle of irony and sometime flood of family feuds issuing from Bayesian forums drew me back into the Bayesian-frequentist debates.1 2  Suddenly sparks were flying, mostly kept shrouded within Bayesian walls, but nothing can long be kept secret even there. Spontaneous combustion is looming. The true-blue subjectivists were accusing the increasingly popular “objective” and “reference” Bayesians of practicing in bad faith; the new O-Bayesians (and frequentist-Bayesian unificationists) were taking pains to show they were not subjective; and some were calling the new Bayesian kids on the block “pseudo Bayesian.” Then there were the Bayesians somewhere in the middle (or perhaps out in left field) who, though they still use the Bayesian umbrella, were flatly denying the very idea that Bayesian updating fits anything they actually do in statistics.3 Obeisance to Bayesian reasoning remained, but on some kind of a priori philosophical grounds. Doesn’t the methodology used in practice really need a philosophy of its own? I say it does, and I want to provide this. Continue reading

## U-Phil: Blogging the Likelihood Principle: New Summary

U-Phil: I would like to open up this post, together with Gandenberger’s (Oct. 30, 2012), to reader U-Phils, from December 6- 19 (< 1000 words) for posting on this blog (please see # at bottom of post).  Where Gandenberger claims, “Birnbaum’s proof is valid and his premises are intuitively compelling,” I have shown that if Birnbaum’s premises are interpreted so as to be true, the argument is invalid.  If construed as formally valid, I argue, the premises contradict each other. Who is right?  Gandenberger doesn’t wrestle with my critique of Birnbaum, but I invite you (and Greg!) to do so. I’m pasting a new summary of my argument below.

The main premises may be found on pp. 11-14. While these points are fairly straightforward (and do not require technical statistics), they offer an intriguing logical, statistical and linguistic puzzle. The following is an overview of my latest take on the Birnbaum argument. See also “Breaking Through the Breakthrough” posts: Dec. 6 and Dec 7, 2011.

Gandenberger also introduces something called the methodological likelihood principle. A related idea for a U-Phil is to ask: can one mount a sound, non-circular argument for that variant?  And while one is at it, do his methodological variants of sufficiency and conditionality yield plausible principles?

______________________________________________________

New Summary of Mayo Critique of Birnbaum’s Argument for the SLP
Deborah Mayo
See also a (draft) of the full PAPER corresponding to this summary, a later and more satisfactory draft is here. Yet other links to the Strong Likelihood Principle SLP: Mayo 2010; Cox & Mayo 2011 (appendix).

Categories: Likelihood Principle, Statistics, U-Phil | 19 Comments

## Guest Post: Greg Gandenberger, “Evidential Meaning and Methods of Inference”

Evidential Meaning and Methods of Inference

Greg Gandenberger
PhD student, History and Philosophy of Science
Master’s student, Statistics
University of Pittsburgh

Bayesian methods conform to the Likelihood Principle, while frequentist methods do not.  Thus, proofs of the Likelihood Principle* such as Birnbaum’s (1962) appear to be threats to frequentist positions.  Deborah Mayo has recently argued that Birnbaum’s proof is no threat to frequentist positions because it is invalid (Ch. 7(III) in Mayo and Spanos 2010).  In my view, Birnbaum’s proof is valid and his premises are intuitively compelling.  Nevertheless, I agree with Professor Mayo that the proof, properly understood, does not imply that frequentist methods should not be used.

There are actually at least two different Likelihood Principles: one, which I call the Evidential Likelihood Principle, says that the evidential meaning of an experimental outcome with respect to a set of hypotheses depends only on its likelihood function for those hypothesis (i.e., the function that maps each of those hypotheses to the probability it assigns to that outcome, defined up to a constant of proportionality); the other, which I call the Methodological Likelihood Principle, says that a statistical method should not be used if it can generate different conclusions from outcomes that have the same likelihood function, without a relevant difference in utilities or prior probabilities. Continue reading

Categories: Likelihood Principle | 17 Comments

## Failing to Apply vs Violating the Likelihood Principle

In writing a new chapter on the Strong Likelihood Principle [i] the past few weeks, I noticed a passage in G. Casella and R. Berger (2002) that in turn recalled a puzzling remark noted in my Jan. 3, 2012 post. The post began:

A question arose from a Bayesian acquaintance:

“Although the Birnbaum result is of primary importance for sampling theorists, I’m still interested in it because many Bayesian statisticians think that model checking violates the (strong) likelihood principle (SLP), as if this principle is a fundamental axiom of Bayesian statistics”.

But this is puzzling for two reasons. First, if the LP does not preclude testing for assumptions (and he is right that it does not[ii]), then why not simply explain that rather than appeal to a disproof of something that actually never precluded model testing?   To take the disproof of the LP as grounds to announce: “So there! Now even Bayesians are free to test their models” would seem only to ingrain the original fallacy.

You can read the rest of the original post here.

The remark in G. Casella and R. Berger seems to me equivocal on this point: Continue reading

## Two New Properties of Mathematical Likelihood

17 February 1890--29 July 1962

Note: I find this to be an intriguing, if perhaps little-known, discussion, long before the conflicts reflected in the three articles (the “triad”) below,  Here Fisher links his tests to the Neyman and Pearson lemma in terms of power.  I invite your deconstructions/comments.

by R.A. Fisher, F.R.S.

Proceedings of the Royal Society, Series A, 144: 285-307 (1934)

To Thomas Bayes must be given the credit of broaching the problem of using the concepts of mathematical probability in discussing problems of inductive inference, in which we argue from the particular to the general; or, in statistical phraselogy, argue from the sample to the population, from which, ex hypothesi, the sample was drawn.  Bayes put forward, with considerable caution, a method by which such problems could be reduced to the form of problems of probability.  His method of doing this depended essentially on postulating a priori knowledge, not of the particular population of which our observations form a sample, but of an imaginary population of populations from which this population was regarded as having been drawn at random.  Clearly, if we have possession of such a priori knowledge, our problem is not properly an inductive one at all, for the population under discussion is then regarded merely as a particular case of a general type, of which we already possess exact knowledge, and are therefore in a position to draw exact deductive inferences.

Categories: Likelihood Principle, Statistics | | 2 Comments

This is a first draft of part II of the presentation begun in the December 6 blog post.  This completes the proposed presentation. I expecterrors, and I will be grateful for feedback! (NOTE: I did not need to actually rip a cover of EGEK to obtain this effect!)

SEVEN:NOW FOR THE BREAKTHROUGH

You have observed y”, the .05 significant result from E”,the optional stopping rule, ending at n = 100.

Birnbaum claims he can show that you, as a frequentist error statistician, must grant that it is equivalent to having fixed n= 100 at the start (i.e., experiment E’)

Reminder:

The (strong) LikelihoodPrinciple (LP) is a universal conditional claim:

If two data sets y’and y” from experiments E’ and E” respectively, have likelihood functions which are functions of the same parameter(s) µ

and are proportional to each other, then y’ and y”should lead to identical inferential conclusions about µ Continue reading

Categories: Birnbaum Brakes, Likelihood Principle | 2 Comments