* “This will be my last post on the (irksome) Birnbaum argument!” *she says with her fingers (or perhaps toes) crossed. But

*really, really*it is (at least until midnight 2013). In fact the following brief remarks are all said, more clearly, in my (old)

**, new paper, Mayo 2010, Cox & Mayo 2011 (appendix), and in posts connected to this U-Phil: Blogging the likelihood principle, new summary 10/31/12*.**

**PAPER***What’s the catch*?

In my recent ‘Ton o’ Bricks” post,many readers were struck by the implausibility of letting the evidential interpretation of **x’*** be influenced by the properties of experiments known not to have produced **x’***. Yet it is altogether common to be told that, should a sampling theorist try to block this, “unfortunately there is a catch” (Ghosh, Delampady, and Semanta 2006, 38): We would be forced to embrace the strong likelihood principle (SLP, or LP, for short), at least according to an infamous argument by Allan Birnbaum (who himself rejected the LP [i]).

It is not uncommon to see statistics texts argue that in frequentist theory one is faced with the following dilemma: either to deny the appropriateness of conditioning on the precision of the tool chosen by the toss of a coin, or else to embrace the strong likelihood principle, which entails that frequentist sampling distributions are irrelevant to inference once the data are obtained. This is a false dilemma. . . . The “dilemma” argument is therefore an illusion. (Cox and Mayo 2010, 298)

In my many detailed expositions, I have explained the source of the illusion and sleight of hand from a number of perspectives (I will not repeat references here). While I appreciate the care that Hennig and Gandenberger have taken in their U-Phils (and wish them all the luck in published outgrowths), it is clear to me that they are not hearing (or are unwittingly blocking) the scre-e-e-e-ching of the brakes!

*No revolution, no breakthrough!*

Berger and Wolpert, in their famous monograph *The Likelihood Principle*, identify the core issue:

The philosophical incompatibility of the LP and the frequentist viewpoint is clear, since the LP deals only with the observed x, while frequentist analyses involve averages over possible observations. . . . Enough direct conflicts have been . . . seen to justify viewing the LP as revolutionary from a frequentist perspective. (Berger and Wolpert 1988, 65-66)[ii]

If Birnbaum’s proof does not apply to a frequentist sampling theorist, then there is neither a revolution nor a breakthrough (as Savage called it). The SLP holds just for methodologies in which it holds . . . We are going in circles.

*Block my counterexamples, please!*

Since Birnbaum’s argument has stood for over fifty years, I’ve given it the maximal run for its money, and haven’t tried to block its premises, however questionable its key moves may appear. Despite such latitude, I’ve shown that the “proof” to the SLP conclusion will not wash, and I’m just a wee bit disappointed that Hennig and Gandenberger haven’t wrestled with my specific argument, or shown just where they think my debunking fails. What would this require?

Since the SLP is a universal generalization, it requires only a single counterexample to falsify it. In fact, every violation of the SLP within frequentist sampling theory, I show, is a counterexample to it! In other words, using the language from the definition of the SLP, the onus is on Birnbaum to show that for any **x**’* that is a member of an SLP pair (E’, E”) with given, different probability models f’, f”, that **x’*** and **x”*** should have the identical evidential import for an inference concerning parameter q–, on pain of facing “the catch” above, i.e., being forced to allow the import of data known to have come from E’ to be altered by unperformed experiments known not to have produced **x’***.

If one is to release the breaks from my screeching halt, defenders of Birnbaum might try to show that the SLP counterexamples lead me to “the catch” as alleged. I have considered two well-known violations of the SLP. Can it be shown that a contradiction with the WCP or SP follows? I say no. Neither Hennig[ii] nor Gandenberger show otherwise.

In my tracing out of Birnbaum’s arguments, I strived to assume that he would not be giving us circular arguments. To say that “I can prove that your methodology must obey the SLP,” and then to set out to do so by declaring “Hey Presto! Assume sampling distributions are irrelevant (once the data are in hand),” is a neat trick, but it assumes what it purports to prove. All other interpretations are shown to be unsound.

______

[i] Birnbaum himself, soon after presenting his result, rejected the SLP. As Birnbaum puts it, ”the likelihood concept cannot be construed so as to allow useful appraisal, and thereby possible control, of probabilities of erroneous interpretations.” (Birnbaum 1969, p. 128.)

(We use LP and SLP synonymously here.)

[ii] Hennig initially concurred with me, but says a person convinced him to get back on the Birnbaum bus (even though Birnbaum got off it [i]).

Some other, related, posted discussions: Brakes on Breakthrough Part 1 (12/06/11) & Part 2 (12/07/11); Don’t Birnbaumize that experiment (12/08/12); Midnight with Birnbaum re-blog (12/31/12). The initial call to this U-Phil, the extension, details here, the post from my 28 Nov. seminar, (LSE), and the original post by Gandenberger,

OTHER :

Birnbaum, A. (1962), “On the Foundations of Statistical Inference“, *Journal of the American Statistical Association* 57 (298), 269-306.

Savage, L. J., Barnard, G., Cornfield, J., Bross, I, Box, G., Good, I., Lindley, D., Clunies-Ross, C., Pratt, J., Levene, H., Goldman, T., Dempster, A., Kempthorne, O, and Birnbaum, A. (1962). On the foundations of statistical inference: “Discussion (of Birnbaum 1962)”, *Journal of the American Statistical Association* 57 (298), 307-326.

Birbaum, A (1970). Statistical Methods in Scientific Inference (letter to the editor). Nature 225, 1033.

Cox D. R. and Mayo. D. (2010). “Objectivity and Conditionality in Frequentist Inference” in *Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science* (D Mayo & A. Spanos eds.), CUP 276-304.

…and if that’s not enough, search this blog.

** **

Mayo: Well, your counterexamples are not counterexamples to the mathematical content of the proof. Both premises of Birnbaum’s proof can be fulfilled by functions Ev which do not depend on the sampling distribution (such as, trivially but not meaningfully in terms of interpretation, a constant function). And for such functions the proof is valid.

Your “counterexamples” are based on premises that add something to the purely mathematical content of the CP and SP as formulated by Birnbaum, by enforcing the inference to *differ* between what is yielded by the suffient statistic in the mixture experiment regarded as a whole and what is yielded in the one of the two mixed experiments that actually brought forth the observed data. You are right in saying that a *reasonable* error statistical inference should differ between the two, but Birnbaum’s original formulation doesn’t enforce this for the function Ev.

So your counterexamples do *not* fulfill the original CP and SP as formulated by Birnbaum (although one can argue that they fulfill a reasonable worded “SP for sampling distributions”) and can therefore not invalidate the proof, and neither can you say that Birnbaum’s original premises cannot both be fulfilled at the same time without invalidating the proof, because functions Ev exist which do fulfill CP and SP and which do not invalidate the argument in the proof stripped of any interpretative implications (you may say that these functions have to suppress information that is needed for good inference, but that’s a problem with interpretation, not with mathematics).

This became clear to me only when I discussed this with P. Dawid who showed me a purely mathematical proof whereas Birnbaum’s original proof (against which I had initially read your arguments) still uses some “interpretative” jargon such as “having evidential meaning” that to me seems to obscure mathematical matters.

Circular.

How is what I was writing here circular?

The point, if followed through, is that to block the counterexamples one is led to either “Hey Presto” (circular) or to the sources of unsoundness I’ve argued in my papers.

You may feel that you have worked hard enough in your papers and don’t owe me a proper response but certainly you can see yourself that just writing “Hey presto (circular)” without any specific reference to what I wrote doesn’t explain much.

Christian: You want me to try and rewrite my papers on a blog; sorry, but I am under the gun and cannot. Moreover, whatever I’d write in a quick comment would not be the carefully worded remarks in my papers, and suddenly I’d be accused of saying something I did not mean. Please try again to work through my most recent and fullest paper. Think too about the very revelation that seemed to strike you in considering Gandenberger’s variations, at least if I understood your response to him in the previous post.

Well said, Christian.

The Chang posts, to paraphrase Mayo and Miller, observes that Birnbaum presents his argument as not purely mathematical but relying on intuitive ‘principles of evidence’, as the in the ‘ton o’ bricks’ post. Trouble is that accepting exactly the principles of evidence so intuitive to Birnbaum (in contrast to Lehmann) does not entail LP. The equivocation in “what cannot matter and need not be reported” enables the deception. It reduces logically to an illicit substitution, disrespecting the bounds of quantifiers, as symbolic logic students know.

Christian, I agree with you. Here I may help by expressing the idea in a different way using Mayo’s example 1.

Example 1 . Binomial vs. Negative Binomial. Consider independent Bernoulli trials, with the probability of success at each trial an unknown constant θ, but produced by different procedures, E’, E”. E’ is Binomial with a pre-assigned number n of Bernoulli trials, say 20, and R, the number of successes observed. In E” trials continue until a pre-assigned number r, say 6, of successes has occurred, with the number N trials recorded. The sampling distribution of R is Binomial:

f(R; θ) = (nC r ) θ^r(1– θ)^n-r

while the sampling distribution of N is Negative Binomial.

f(N; θ) = (n-1C r-1) θ^r(1– θ)^n-r

If two outcomes from E’ and E” respectively, have the same number of successes and failures, r and n, then they have the “same” likelihood, in the sense that they are proportional to θ^r(1– θ)^n-r.

Original version of the following (Mayo):

The two outcomes, x’* and x”* are SLP pairs. But the difference in the sampling distributions of the respective statistics, R and N, of E’ and E” respectively, entails a difference in p-values or confidence level assessments. Accordingly, their evidential appraisals differ for sampling distribution inference. Thus x’* and x”* are SLP pairs leading to an SLP violation.

Revised version (Lew):

The two outcomes, x’* and x”* are SLP pairs. But the difference in the sampling distributions of the respective statistics, R and N, of E’ and E” respectively, entails a difference in p-values or confidence level assessments and if those p-values and confidence intervals were invariantly related to evidence then either (i) that SLP would constitute an SLP violations, or (ii) the nature of the evidence quantitated by the p-values and confidence intervals is different from that quantitated by likelihood, or (iii) p-values and confidence intervals do not quantitate evidence.

Mayo appears to prefer (i), but I think that (iii) is true.

Mayo’s disproof seems to rest on the notion that p-values ‘corrected’ for sequential sampling and confidence intervals relate directly to evidence. That is false. They can certainly be used for inference, but that inference is based on something other than evidence.

Typo in (i). It should read “that SLP pair would constitute an SLP violation”.

Lew: The very fact that one can keep to this “preference” proves my point. The onus is on Birnbaum to show that SLP violations, which are perfectly plausible from the perspective of the error statistical account of evidence, lead to “the catch”,( or in some way violate frequentist principles WCP or SP). Birnbaum himself formulated the Confidence concept of evidence to capture the error statistical notion.

The disproof of the SLP depends on the error approaches being valid as evidence: if they are not evidence then the fact that there are several different ways to calculate their values in that example has no effect on the validity of the SLP. If option (iii) is true then the error-based notions of evidence are wrong and the SLP can be true.

The strong likelihood principle is false and so error-based notions of evidence are OK, and the reason that the strong likelihood principle is false is because it disagrees with error-based notions of evidence. Doesn’t that sound a little circular?

I think it would be really great if the Mayo argument and some of the commentary about the valididity, intepretation and implications were published in a stats journal. Even if the ideas presented already exist in old papers.

David: Thank you. Any ideas (of journals)?

I’d welcome such a paper, with commentary. JASA (home of Birnbaum 1962) would be worth a try, also JRSSB, and Statistical Science. Canadian Journal of Statistics is good but less high profile; worth considering as it has had some previous interest in such material. Given that it’s unusual material, pre-submission enquiries with the editors might be useful.

However, if the paper’s not extremely clearly written, written for an audience of statisticians, and doesn’t include some examples where the principle actually matters somewhat in practice, then it’ll have a tough time with statistical reviewers and/or will be ignored by a statistical readership. Good luck.

OG: Thanks so much*. So do statisticians think “the principle actually matters somewhat in practice”? I seriously would like to know. Gelman’s blog yesterday says philosophical foundations matter, but he obviously sees this as exceptional. This issue arises also for Bayesians. Yet if it doesn’t matter, then why is it so often included in statistics textbooks?

And if you are a statistician, then why would you “welcome such a paper, with commentary” unless you think it matters or else regard yourself as atypical from most statistical audiences.

Anyway, it was mandatory for me to disprove “the catch” in the context of my 2010 paper with Cox on conditioning.

*I appreciate the advice. I hadn’t known of “pre-submission enquiries” with editors.

Briefly; I think principles matter and are interesting just by themselves. Many other statisticians I’ve met instead emphasize judging methods empirically, i.e. using other information to see how they did, when this is possible. We almost all do a bit of both; in the profession as a whole I don’t know what the mix is, sorry.

Why are the textbooks the way they are? Maybe those who have a clear view of principles (whatever that view is) think that others will benefit from having this clarity laid out in textbook form. Maybe those who instead choose methods based largely on their experience and intuition probably find it harder to write about that, and so are less likely to produce textbooks. NB both types can be excellent statisticians.

Journal editors and reviewers for statistical journals are a similarly mixed group; a paper on principles and philosophy may be more likely to get published and read if it makes clear some relevance to what statisticians do in practice.

OG: Thanks again. I think in this case the impact has been real—Savage was right about that–but subliminal. If you harbor even a bit of unease as to whether a methodology based on sampling distributions and error probabilities can have a sound “evidential” justification, then you are less likely to develop one with any confidence.

I don’t think it far-fetched to suppose that hearing (as everyone has) that the LP might be warranted at some foundational level (due to an oft-repeated, but puzzling argument) has led to the bit of unease I mention. Being freed from this might open the door to developing better rationales for all methods that utilize sampling distributions, frequentist and Bayesian. Remember that nonsubjective or default Bayesians also see themselves as needing to violate the LP.

Professor Mayo’s statement that she was “a wee bit disappointed” that I haven’t wrestled with her specific argument is fair as a comment on my U-Phil, which was intended to be brief. I have addressed her argument in more detail in a forthcoming paper a draft of which is available at https://www.dropbox.com/s/nvzcrjdhydsls2w/new_proof_of_lp7.4.pdf

Greg: Thanks. I’m very glad a few philosophers, at least, are interested in this.

I think we’re struggling with logic vs. mathematics as frames of reference here.

I can see why taking into account everything in Birnbaum’s paper including all the interpretations of principles and experiments one can say that his argument is invalid (as far as it concerns these interpretations). If the interpretations are considered as integral part of Birnbaum’s argument and a logician analyses this, probably Mayo’s claim follows.

However, mathematics attempts to look at what is formally well defined only, which I think leads to the point of view of Greg and myself, at least when constructing a proof that sticks to pure mathematics (as Dawid did).

Would it be fair to say that the mathematician can be content with the circular argument so long as the proof is valid, but the scientist cannot accept it regardless of the tidiness of the math?

john byrd: Mayo may think otherwise but nothing in the purely mathematical formulation is circular. Popular *interpretations* of the theorem are circular (i.e., that it follows that frequentist methods that violate the SLP should not be used for inference/quantifying evidence), and I am as much against them as Mayo is.

No it wouldn’t be fair to say that, because the argument purports to permit detaching a claim. A circular proof just assumes what one wants to prove. If someone simply wished to show the logical equivalent of two claims, that would be one thing, but “the catch” for the sampling theorist wishing to use the correct sampling distribution does not follow. Likewise, even if (A equiv B), it does not mean you have shown A. Remember, here we have two statistical accounts or principles of evidence or what have you that reach opposite claims about evidential import. What follows about evidential import in one system needn’t follow in another. Finally, you might check the definition of sufficiency in Cox and Mayo (2010).

Christian: Well Birnbaum makes it clear at the start that his argument is based on intuitive notions of evidence and is NOT a mathematical demonstration. I don’t see how one can appreciate what’s going on without seeing this. Or rather, when the principles are cashed out in the manner that Birnbaum’s argument depends on the argument is unsound. You should have come to my LSE seminar.

Mayo: It may be that we agree on all implications of this discussion regarding evidence in general and the foundations of statistics. I tried to make this clearer with my previous posting. The issue seems to be what we mean to be part of “the argument” and what we don’t when we discuss this. Obviously you include more than what I regard to be the “mathematical core”, and what you challenge is this additional content. I can see that you can justify this content by what Birnbaum writes in his paper as a whole, and I feel that I do understand your argument against it (and I agree with it).

Regarding the purely mathematical version of the proof, I have said from the start of this discussion that I do not talk about the way this is presented in Birnbaum’s paper, which I perceive as slightly confused/confusing from a mathematician’s perspective, exactly because purely mathematical reasoning is mixed up with the use of non-mathematical terms.

Stripped of interpretation, showing that SP & WCP => SLP (using the Birnbaum definitions in fully formalised language not worrying about the interpretation of function Ev; as I’ve seen from Dawid) is still valid, non-circular and non-trivial (in the sense that pretty much nobody saw it before Birnbaum wrote his paper), although one can say that it’s a rather modest mathematical result with no implications about how inference should be done (although one interesting implication is that *if* one wants inference guided by such principles, a frequentist/error statistician would need an amended version of the SP, as you give in your recent draft, for the reasons that you give).

So I largely agree with what you’re writing and your last comment is really very close to my point, because it allows to separate what you call “Brinbaum’s argument”, which you correctly challenge, from a purely mathematical demonstration of limited interest that can be obtained from Birnbaum’s paper by taking out the intuition you refer to.

This would open a possibility for you to make your argument without bewildering mathematicians who wonder why you seem to attack a proof that looks fine as far as the math is concerned.

Christian: I disagree, and one way to see the problem is to look at the defn. of the SP in Cox and Mayo 2010. One experimental model, one sampling distribution. Please work through my paper. I’ve said too much on this already.

Mayo: I am aware of this and was before but your definition adds something that was not in Birnbaum’s definition. (You may argue that it should have been and one can see this from the remainder of Birnbaum’s paper, but still it wasn’t.)

No, that is the definition, and BB used it in order to turn two experiments into one (using the convex combination for the sampling distribution). The idea comes from Cox’s weighing example….

Well may the independent reader have a look (at both papers, at least) and decide for him/herself.