*Greg Gandenberger*

*PhD student, History and Philosophy of Science*

*Master’s student, Statistics*

*University of Pittsburgh*

In her 1996 *Error and the Growth of Experimental Knowledge*, Professor Mayo argued against the Likelihood Principle on the grounds that it does not allow one to control long-run error rates in the way that frequentist methods do. This argument seems to me the kind of response a frequentist should give to Birnbaum’s proof. It does not require arguing that Birnbaum’s proof is unsound: a frequentist can accommodate Birnbaum’s conclusion (two experimental outcomes are evidentially equivalent if they have the same likelihood function) by claiming that respecting evidential equivalence is less important than achieving certain goals for which frequentist methods are well suited.

More recently, Mayo has shown that Birnbaum’s premises cannot be reformulated as claims about what sampling distribution should be used for inference while retaining the soundness of his proof. It does not follow that Birnbaum’s proof is unsound because Birnbaum’s original premises are not claims about what sampling distribution should be used for inference but instead as sufficient conditions for experimental outcomes to be evidentially equivalent.

Mayo acknowledges that the premises she uses in her argument against Birnbaum’s proof differ from Birnbaum’s original premises in a recent blog post in which she distinguishes between “the Sufficient Principle (general)” and “the Sufficiency Principle applied in sampling theory.“ One could make a similar distinction for the Weak Conditionality Principle. There is indeed no way to formulate Sufficiency and Weak Conditionality Principles “applied in sampling theory”* *that are consistent and imply the Likelihood Principle. This fact is not surprising: sampling theory is incompatible with the Likelihood Principle!

Birnbaum himself insisted that his premises were to be understood as “equivalence relations” rather than as “substitution rules” (i.e., rules about what sampling distribution should be used for inference) and recognized the fact that understanding them in this way was necessary for his proof. As he put it in his 1975 rejoinder to Kalbfleisch’s response to his proof, “It was the adoption of an unqualified equivalence formulation of conditionality, and related concepts, which led, in my 1972 paper, to the monster of the likelihood axiom” (263).

Because Mayo’s argument against Birnbaum’s proof requires reformulating Birnbaum’s premises, it is best understood as an argument not for the claim that Birnbaum’s original proof is invalid, but rather for the claim that Birnbaum’s proof is valid only when formulated in a way that is irrelevant to a sampling theorist. Reformulating Birnbaum’s premises as claims about what sampling distribution should be used for inference is the only way for a fully committed sampling theorist to understand them. Any other formulation of those premises is either false or question-begging.

Mayo’s argument makes good sense when understood in this way, but it requires a strong prior commitment to sampling theory. Whether various arguments for sampling theory such as those Mayo gives in *Error and the Growth of Experimental Knowledge *are sufficient to warrant such a commitment is a topic for another day. To those who lack such a commitment, Birnbaum’s original premises may seem quite compelling. Mayo has not refuted the widespread view that those premises do in fact entail the Likelihood Principle.

Mayo has objected to this line of argument by claiming that her reformulations of Birnbaum’s principles are just instantiations of Birnbaum’s principles in the context of frequentist methods. But they cannot be instantiations in a literal sense because they are imperatives, whereas Birnabaum’s original premises are declaratives. They are instead instructions that a frequentist would have to follow in order to avoid violating Birnbaum’s principles. The fact that one cannot follow them both is only an objection to Birnbaum’s principles on the question-begging assumption that evidential meaning depends on sampling distributions.

********

**Birnbaum’s proof is not wrong but error statisticians don’t need to bother**

**Christian Hennig**

*Department of Statistical Science
University College London*

I was impressed by Mayo’s arguments in “Error and Inference” when I came across them for the first time. To some extent, I still am. However, I have also seen versions of Birnbaum’s theorem and proof presented in a mathematically sound fashion with which I as a mathematician had no issue.

After having discussed this a bit with Phil Dawid, and having thought and read more on the issue, my conclusion is that

1) Birnbaum’s theorem and proof are correct (apart from small mathematical issues resolved later in the literature), and they are not vacuous (i.e., there are evidence functions that fulfill them without any contradiction in the premises),

2) however, Mayo’s arguments actually do raise an important problem with Birnbaum’s reasoning.

Here is why. Note that Mayo’s arguments are based on the implicit (error statistical) assumption that the sampling distribution of an inference method is relevant. In that case, application of the sufficiency principle to Birnbaum’s mixture distribution enforces the use of the sampling distribution under the mixture distribution as it is, whereas application of the conditionality principle enforces the use of the sampling distribution under the experiment that actually produced the data, which is different in the usual examples. So the problem is not that Birnbaum’s proof is wrong, but that enforcing both principles at the same time in the mixture experiment is in contradiction to the relevance of the sampling distribution (and therefore to error statistical inference). It is a case in which the sufficiency principle suppresses information that is clearly relevant under the conditionality principle. This means that the justification of the sufficiency principle (namely that all relevant information is in the sufficient statistic) breaks down in this case.

Frequentists/error statisticians therefore don’t need to worry about the likelihood principle because they shouldn’t accept the sufficiency principle in the generality that is required for Birnbaum’s proof.

Having understood this, I toyed around with the idea of writing this down as a publishable paper, but I now came across a paper in which this argument can already be found (although in a less straightforward and more mathematical manner), namely:

M. J. Evans, D. A. S. Fraser and G. Monette (1986) On Principles and Arguments to Likelihood. *Canadian Journal of Statistics* 14, 181-194, http://www.jstor.org/stable/3314794, particularly Section 7 (the rest is interesting, too).

**NOTE: This is the last of this group of U-Phils. Mayo will issue a brief response tomorrow. Background to these U-Phils may be found here.**

Christian: Could you expand on the following statement? “It is a case in which the sufficiency principle suppresses information that is clearly relevant under the conditionality principle.” I understand that the sufficient statistic for the mixture experiment Birnbaum constructs discards the information about the component experiment from which the outcome came for what Mayo calls “star pairs” (outcomes that have likelihood function proportional to that of an outcome from the other component experiment). I don’t see that this information “is clearly relevant under the conditionality principle.” After all, the conditionality principle says that the discarded information is irrelevant to evidential meaning. It would be relevant if the conditionality principle were a directive to condition on experimental ancillaries, but again reading it in that involves making the implicit (question-begging) assumption that sampling distributions are relevant. Am I missing something?

Dear Greg, first I want to thank you for your contribution, with which I fully agree. I’m even happy to admit that your contribution makes the point that I wanted to make, too, in a better elaborated way. I had written my contribution a little too quickly and tried to improve it but the later version contained something that looked like a reading error on my behalf of Mayo’s recent draft, so she suggested to stick to the first version and I agreed.

Regarding the point you raise in your comment, note that I use the word “relevant” here referring not to the purely mathematical version of the conditionality principle from which the proof can be derived, but rather the way the principles and the result have been interpreted in an anti-frequentist manner, implying that the sufficiency principle means that “all relevant information for the evaluation of evidence is in the sufficient statistic”. However, the sufficiency principle applied to Birnbaum’s mixture distribution suppresses the information from which of the two mixed experiments the observation had been generated, which is clearly relevant to a frequentist/error statistician who wants to properly apply the conditionality principle in this setup. Sampling distributions are not relevant to the purely mathematical content of the proof (this is the point where the two of us apparently disagree with Mayo), but they are certainly relevant to an error statistician presented with an “argument” such as “you should adhere to the sufficiency principle generally because you should accept that everything that is relevant is in the sufficient statistic”. It’s not.

Thanks for the response, Christian! I find your critique of Mayo’s argument quite clear and am only trying to get a clearer understanding of your claim, “Frequentists/error statisticians therefore don’t need to worry about the likelihood principle because they shouldn’t accept the sufficiency principle in the generality that is required for Birnbaum’s proof.”

Do you have in mind something like Kalbfleisch’s approach (1975) (roughly: first condition on the component experiment actually performed, then reduce by sufficiency, and then condition on mathematical ancillary statistics), at least as appropriate from a frequentist perspective?

Greg: I think that Mayo (and some other including Evans et al. before) have a valid case that *if inference includes the sampling distribution* the SP and the CP make conflicting prescriptions for the treatment of Birnbaum’s mixture experiment.

So the argument actually does imply (in my view) that Birnbaum’s proof doesn’t prove anything that can have normative implications for the frequentist/error statistician.

I then thought about where the intuitive interpretation of the SP and WCP breaks down that is used to suggest to frequentists that they should adhere to these principles, and this is that “the sufficient statistics captures everything relevant for inference”, which in Birnbaum’s mixture experiment it doesn’t.

I think that Mayo’s “sufficiency principle for sampling theory” is fine, at least at (my) first sight, although our blog host doesn’t seem to accept that this is essentially different from what Birnbaum required (although she may be right that Birnbaum intended something like this). I have only read Kalbfleisch’s paper very quickly; I didn’t find his arguments for his version of the principle as intuitive as those in Evans et al. and Mayo’s draft, but I may not have put enough time into it. The general tendency is the same as mine: Frequentists should not accept the SP in the form given by Birnbaum.

That said, as somebody who does a lot of practical statstics and data analysis, I’m very skeptical about any too general prescriptive principle. They lose their authority immediately if someone comes up with a situation in where they can be showed to not work well, and this happens easily if a principle is too general.

Thanks for the response, Christian. I wish I had your experience in practical data analysis! I’m quite interested in questions about how the kinds of principles we’re considering relate to actual statistical practice.

I have concerns about the “the sufficiency principle applied in sampling theory,” including those Birnbaum gives in his response to Kalbfleisch’s 1975 paper, those in Berger and Wolpert’s Likelihood Principle monograph (pp. 45-46 in 1988 edition), a slightly modified version of Savage’s objection to Durbin’s approach (1970), and the existence of an alternative proof that I give here: https://www.dropbox.com/s/nvzcrjdhydsls2w/new_proof_of_lp7.4.pdf. For that reason among others, I think a frequentist/error statistician’s best bet is to argue that tracking evidential meaning is less important than securing other purported virtues such as objectivity and good long-run operating characteristics. Of course, there are challenges in that direction as well.

Dear Greg, I had a look at your paper. Apparently you’re also in the business of establishing general normative principles on which I’m not very keen.

As far as I see it, the trouble with your new “Weak Ancillary Realisability Principle” is the same as with the sufficiency principle: If I accept it, it excludes the sampling distribution from considerations in the Gandenberger (i.e., generalised Birnbaum;-)) mixture because it forces me to ignore potentially existing information that is relevant to determining the sampling distribution conditionally on all that is known, which is necessary for proper application of the conditionality principle.

So again I say that the proof is not wrong but frequentists don’t need to bother. It’s just another illustration that the fact that principles are “intuitively appealing” doesn’t mean that we have to follow them in whatever situation (from which the intuition doesn’t stem).

In your introduction you seem to *assume* that what the term “evidential equivalence” means has to be determined by such principles (perhaps refined versions) and if methods based on the sampling distribution violate it, then what they deliver is no proper formalisation of “evidence” and they can only be useful for other reasons. But this looks like a language-game. You write that “Bayesian methods have the correct foundations” – in what sense? According to such often disputed and highly problematic principles that basically appeal to “general intuition” and are then used in highly artificial setups detached from this intuition? It is clear how your principles exclude the sampling distribution from playing a role, so this is implied from the beginning and not a *consequence* of your (or Birnbaum’s) result. (Actually this may be the “circularity” Mayo refers to; it’s interesting that you criticise her, on the other hand, for assuming implicitly that the sampling distribution has to play a role. I agree with you regarding the mathematics but with her regarding the normative meaning of the theorems.)

For somebody who thinks that it makes sense to consider the sampling distribution of the used statistics and functions of it such as the p-value for assessing evidence, your theorem and proof (albeit correct), as well as Birnbaum’s, have no implications.

It seems that Christian is regaining his sanity!

” It is clear how your principles exclude the sampling distribution from playing a role, so this is implied from the beginning and not a *consequence* of your (or Birnbaum’s) result. (Actually this may be the “circularity” Mayo refers to.”

Indeed!

Mayo: It’s consistent with what I wrote before, though.

Dear Christian,

I appreciate you taking a look at my paper. I think we agree on more than one might think at first sight.

You write: “Apparently you’re also in the business of establishing general normative principles on which I’m not very keen.”

Although I give a proof of the Likelihood Principle, I also express reasons to doubt that we should regard it as a general normative principle. The upshot of my paper in my mind is simply that frequentists should look elsewhere than intuitions about evidence to justify their methods.

You write: “As far as I see it, the trouble with your new ‘Weak Ancillary Realisability Principle’ is the same as with the sufficiency principle: If I accept it, it excludes the sampling distribution from considerations in the Gandenberger (i.e., generalised Birnbaum;-)) mixture because it forces me to ignore potentially existing information that is relevant to determining the sampling distribution conditionally on all that is known, which is necessary for proper application of the conditionality principle.”

I’m afraid I’m not following. In a sense, the Weak Ancillary Realizability Principle forces you not to ignore something that is known that does seem relevant to determining the sampling distribution conditionally on all that is known, namely the value of a mathematical ancillary statistic in a minimal experiment. It’s hard to see why we shouldn’t treat mathematical ancillaries the same way we treat experimental ancillaries, at least in the kinds of simple cases that the Weak Ancillary Realizability Principle covers.

You write: “In your introduction you seem to *assume* that what the term ‘evidential equivalence’ means has to be determined by such principles (perhaps refined versions) and if methods based on the sampling distribution violate it, then what they deliver is no proper formalisation of ‘evidence’ and they can only be useful for other reasons. But this looks like a language-game.”

I agree. That’s why I don’t endorse the Likelihood Principle as a general normative principle even though I think it follows from intuitively compelling constraints on the notion of evidential meaning. One might regard conforming to such constraints as an end in itself, but I don’t.

On a separate point, I don’t claim that Bayesian methods have the correct foundations! I say only that frequentist methods would be useful as computational shortcuts even if Bayesian methods did have the correct foundations, in whatever sense one likes.

I don’t understand your statement that “It is clear how [my] principles exclude the sampling distribution from playing a role.” Each of them individually is quite compatible with the use of sampling distributions.

Lastly, you write: “For somebody who thinks that it makes sense to consider the sampling distribution of the used statistics and functions of it such as the p-value for assessing evidence, your theorem and proof (albeit correct), as well as Birnbaum’s, have no implications.”

If one finds the premises of either proof intuitively compelling, then in order to remain a frequentist one must deny the methodological significance of intuitions about evidential meaning. I think that’s a real discovery that should change the way many of us approach the philosophy of statistics and epistemology more generally.

Greg: I’m struggling a bit with your notation so I’m not yet 100% sure but the point seems to be, to me, that a frequentist is interested in the sampling distribution under a well defined reproducible procedure that leads to the outcome that then provides the evidence. Now your WARP seems to allow to turn arbitrary partitions of the sample space into a hypothetical mixture experiment as long as the partition probabilities don’t depend on the true parameter, regardless of what reproducible experimental procedure the frequentist would want to analyse in the given situation. The construction in the proof then decomposes the whole experiment into a series of mixtures at the end of which you’re left with a Bernoulli-experiment reproducing only P_\theta(x_0). But this reduction is tailored to the specific outcome x_0, i.e., one can only define it forgetting that initially one had started off with an experimental procedure that allowed x_0 as well as all the other outcomes, and of which the frequentist, when analysing error probabilities, cannot assume to know what exactly the outcome will be. So the problem is not that the WARP restricts the information that can be used, but rather it allows and actually enforces the use of too much knowledge, namely something the the frequentist cannot use when determining the sampling distribution, which is the knowledge of the specific result. This gives you leverage, in the proof, to enforce the construction to “forget” the experimental procedure, i.e. what of it is controlled by the experimenter and what isn’t.

I admit that you managed to hide the problem better than Birnbaum did (and Birnbaum was quite successful already). 😉

You wrote:

“If one finds the premises of either proof intuitively compelling, then in order to remain a frequentist one must deny the methodological significance of intuitions about evidential

meaning.”

An intuition isn’t a scientific fact. It belongs to a person and this person may be sober enough to mistrust their own intuition when applied to situations remote from those on which their intuition was originally based. Once the person understands what’s wrong with the old intuition, a new improved one can take over.

Different pairs of outcomes from different experiments that have proportional likelihood functions typically require different hypothetical experiments in my proof that they are evidentially equivalent. I don’t see why that fact is a problem for me. Frequentists won’t be able to use the proof as a recipe to set things up in advance so that they will obey the Likelihood Principle with respect to any pair of outcomes with proportional likelihood functions that might arise, but if one finds the premises of the proof more compelling than the use of sampling distributions then that fact looks like an objection to frequentist methods rather than an objection to the proof. One could always use Likelihoodist methods or Bayesian conditioning and thereby obey all of the principles we are discussing automatically.

As always, one person’s modus ponens is another’s modus tollens.

I agree with your comments on intuitions. At this time I don’t see anything wrong with the intuitions that lead to the Likelihood Principle as a sufficient condition for experimental outcomes to be evidentially equivalent. I am suspicious of the intuition that a sensible method of inductive inference must track evidential meaning, because tracking evidential meaning in accordance with the Likelihood Principle precludes securing the kind of reliability that makes frequentist methods appealing.

Greg: “but if one finds the premises of the proof more compelling than the use of sampling distributions then that fact looks like an objection to frequentist methods rather than an objection to the proof.”

True; I’m advising the frequentist against finding the premises too compelling, not against the proof. (Same as my take on Mayo’s argument.)

“At this time I don’t see anything wrong with the intuitions that lead to the Likelihood Principle as a sufficient condition for experimental outcomes to be evidentially equivalent. I am suspicious of the intuition that a sensible method of inductive inference must track evidential meaning,”

That’s a legitimate point of view but one may well say that in order to assess evidential meaning, one needs the effective sampling distribution of the experiment, which leads to a different intuition. It seems to boil down to the question how one defines “evidential meaning”.