**Evidential Meaning and Methods of Inference**

*Greg Gandenberger*

PhD student, History and Philosophy of Science

Master’s student, Statistics

University of Pittsburgh

Bayesian methods conform to the Likelihood Principle, while frequentist methods do not. Thus, proofs of the Likelihood Principle* such as Birnbaum’s (1962) appear to be threats to frequentist positions. Deborah Mayo has recently argued that Birnbaum’s proof is no threat to frequentist positions because it is invalid (Ch. 7(III) in Mayo and Spanos 2010). In my view, Birnbaum’s proof is valid and his premises are intuitively compelling. Nevertheless, I agree with Professor Mayo that the proof, properly understood, does not imply that frequentist methods should not be used.

There are actually at least two different Likelihood Principles: one, which I call the Evidential Likelihood Principle, says that the evidential meaning of an experimental outcome with respect to a set of hypotheses depends only on its likelihood function for those hypothesis (i.e., the function that maps each of those hypotheses to the probability it assigns to that outcome, defined up to a constant of proportionality); the other, which I call the Methodological Likelihood Principle, says that a statistical method should not be used if it can generate different conclusions from outcomes that have the same likelihood function, without a relevant difference in utilities or prior probabilities.

Birnbaum’s proof is a proof of the Evidential Likelihood Principle. It is often taken to show that frequentist methods should not be used, but the Evidential Likelihood Principle does not imply that claim. The Methodological Likelihood Principle does imply that frequentist methods should not be used, but Birnbaum’s proof—at least as originally presented—is not a proof of the Methodological Likelihood Principle.

There are two ways one might respond to this point on behalf of the claim that Birnbaum’s proof does show that frequentist methods should not be used. One is to argue for an additional premise that would allow one to derive the Methodological Likelihood Principle from the Evidential Likelihood Principle. Another is to argue that the point is pedantic because Birnbaum’s proof might as well be recast as a proof of the Methodological Likelihood Principle. I will consider each of these responses in turn.

One could derive the Methodological Likelihood Principle from the Evidential Likelihood Principle by invoking what I call the Evidential Equivalence Norm, which says that a statistical method should not be used if it can generate different conclusions from outcomes that are evidentially equivalent, without a relevant difference in utilities or prior probabilities.

The Evidential Equivalence Norm has intuitive appeal. It seems to say less than Hume’s dictum, “A wise man proportions his belief to the evidence,” which is often regarded as a truism. But is this truism actually true? It seems to me that Humen’s dictum and the Evidential Equivalence Norm are justified only insofar as they help us achieve our epistemic and practical ends. If Birnbaum’s proof is sound, then proportioning one’s belief to the evidence seems to require using Bayesian methods. There are non-Bayesian methods that abide by the Likelihood Principle, but they claim only to characterize data as evidence, not to tell one how to proportion one’s beliefs (Royall 1997). But there are situations in which it is not at all clear that Bayesian methods are the best approach for achieving our epistemic and practical ends. Take, for instance, the search for the Higgs boson. One could try to use the data generated at CERN to update one’s subjective prior probability that the Higgs exists in the reported energy range, but it’s hard to see why we should think that doing so will serve the goal of arriving at an approximately truthlike theory given that we seem to have no reasonable basis at all for choosing our priors in this case. I am inclined to agree with Larry Wasserman that frequentist hypothesis testing is exactly the right tool for this case, despite its shortcomings.

I am of course running roughshod over a number of subtle and important issues. My point is only that the Evidential Equivalence Norm cannot be taken for granted. If norms are justified only insofar as they help us achieve our ends, then pointing out that the Evidential Equivalence Norm appeals to our intuitions is not enough to show that it is justified.

A second way to respond to my claim that Birnbaum’s proof is no threat to frequentist methods because it only establishes the Evidential Likelihood Principle is to claim that it would be unproblematic to recast Birnbaum’s proof as a proof of the Methodological Likelihood Principle. One would simply have to reformulate Birnbaum’s premises (what he calls the Sufficiency and Conditionality Principles) in a methodological vein. Thus, the Sufficiency Principle would become the following:

A statistical method should not be used if it can generate different conclusions from outcomes that give the same value for a sufficient statistic, without a relevant difference in utilities or prior probabilities.

And the Conditionality Principle would become the following:

A statistical method should not be used if it can generate different conclusions from the outcome of a mixture experiment and the corresponding outcome of the component of that mixture experiment that was actually performed.

It is easy to show that these premises do imply the Likelihood Principle. But the same argument I just gave for resisting the Evidential Relevance Norm can be given for resisting these claims: if a norm can be justified only by showing that it helps us achieve desired ends, then the fact that these principles gratify our intuitions is not enough to justify them.

Accepting Birnbaum’s proof of the Likelihood Principle does require frequentists to give up the claim that their methods respect evidential equivalence. It does not require them to give up the claim that their methods are, at least in some cases, the best inferential tools we have.

Editors’ inserts:

***Strong Likelihood Principle (From Dec. 6):**

If two data sets ** y’** and

**from experiments E’ and E” respectively, have likelihood functions which are functions of the same parameter(s) µ and are proportional to each other, then**

*y”***and**

*y’***should lead to identical inferential conclusions about µ.**

*y”***Related blogposts:**

“Breaking Through the Breakthrough” posts: Dec. 6 and Dec 7, 2011:, and the next ( Oct 31, 2012) post. for further related posts, use the search feature for the strong likelihood principle.

REFERENCES:

Birnbaum, A. (1962). On the foundations of statistical inference. In S. Kotz and N. Johnson (eds), *Breakthroughs in statistics, *(Vol.1, pp. 478-518). Springer Series in Statistics, New York: Springer-Verlag. Reprinted from *Journal of the American Statistical Association, 57, *269–306.

Mayo, D. G. (2010). An error in the argument from conditionality and sufficiency to the likelihood principle. In D. Mayo and A. Spanos (Eds.), *Error and inference: Recent exchanges on experimental reasoning, reliability, and the objectivity and rationality of science *(pp. 305-314). Cambridge: Cambridge University Press.

Royall, R. (1997), *Statistical Evidence: a Likelihood Paradigm*, London: Chapman and Hall.

Greg: Thanks so much for your interesting post. I’m pleased to see a philosophy Ph.D student pursuing these general issues, and especially look forward to getting to the bottom of this logical puzzle: While you claim “Birnbaum’s proof is valid and his premises are intuitively compelling,” I have shown that if Birnbaum’s premises are interpreted so as to be true, the argument is invalid. If construed as formally valid, I maintain, the premises contradict eachother. They require, in effect, that the evidential import of a known result x’ from experiment E’ should, and also should not, be influenced by a distinct and unperformed experiment E”. Hopefully I can encourage you (and others) to wrestle with the specifics of my argument. Comments are always invited, but I’ll put out a call for (longer-term) “U-Phils” tomorrow, along with an overview of my latest treatment of the Birnbaum argument.

Greg: Apart from the fact that I have thought through Birnbaum’s original proof and Mayo’s argument against it, and am on Mayo’s side on this one, there is a nice argument inspired by Davies (1995) to back up your scepticism toward the Evidential Equivalence Norm.

The thing is that if we accept that models are idealisations and are not precisely true, using sufficiency no longer preserves the relevant evidence, because sufficiency critically depends on the assumed model and sufficient statistics throw away all model diagnostic information. Unfortunately, many sufficient statistics are quite non-robust, meaning that some distributions not exactly obeying the model assumptions, but very close to a member of the assumed family of distributions may have quite different values of the sufficient statistics (the most notorious example is an outlying point mass contaminating a Gaussian distribution, having an arbitrarily strong impact on mean and variance). Therefore, if we are in principle willing to make a Gaussian assumption but want to safeguard against certain kinds of violations of assumptions, we rather want to use statistics that are insufficient but more robust.

Davies (1995) Data Features. Statistica Neerlandica 49, 185-245. A very nice and illuminating read in any case.

Thank you for the reference! I am aware of the fact that the applicability of the Likelihood Principle is (severely?) limited by the fact that it assumes a fixed model. I look forward to learning more about the nonrobustness of sufficient statistics.

Although the cited paper is among my favourites, you can read more about the nonrobustness of specific sufficient statistics elsewhere. For example you may have a look at the classic books on robust statistics by Huber, Hampel et al.

It it not necessary that Bayesian methods conform to the Likelihood Principle. Bayesian

inference, conditional on the model and data, conforms to the likelihood principle. But there is more to Bayesian methods than Bayesian inference. See chapters 6 and 7 of Bayesian Data Analysis for much discussion of this point.It saddens me to see that people are still confused on this issue.

Gelman: But of course the LP is defined as given the model, as Birnbaum, Barnard, etc. made clear. Of course the likelihood alone is given the model and the data x.

I can understand why it is frustrating for you to see me use the phrase “Bayesian methods” to refer only to the use of Bayes’ theorem! I will be careful to make this distinction in the future.

Greg: I’m not sure if you’re responding to me or Gelman (presumably the latter). But anyway, I was responding to him. I’m not frustrated with your saying Bayesian methods obey the LP, I agree! The “default” and “reference” Bayesians (generally) concur as well that in violating the LP, they are being Bayesian incoherent, though some will say, nowadays, that they are only “technically” incoherent. Most importantly, these violations do not seem driven by the goal of securing error probabilities (at least it’s not put that way).

I suspect that the best current Bayesian practice would not be to place a prior distribution for the Higgs boson existing at a given energy level (I agree that there doesn’t seem to have been a reasonable basis for this), but instead would be to employ a non- or weakly-informative prior on hyperparameters for the energy level distribution.

Here’s the best paper I could find on statistical analysis involved in the search for the Higgs boson:

http://www.samsi.info/sites/default/files/Cranmer_september2012.pdf

Looking at the chart on page 76, you can see that the evidence is now strong enough (10^9) to swamp any reasonable prior distribution (whether over energy level, or over hyperparameters). So Bayesians and frequentists should now agree.

I think the Bayesian approach (using hyperparameters) would have had an advantage in this research process if it had been expensive to search the range of energies. For example, if instead of the actual case of being able to detect events at a wide range of energy levels from a single beam energy, the LHC had to be operated at some multiple of the energy level to be detected. You would then cycle the LHC through its beam energy range, spending more time at energies proportional to higher posterior density of excess events.

For clarification on the above Gandenberger-Gelman comments, see Gelman’s blog: http://andrewgelman.com/2012/10/it-not-necessary-that-bayesian-methods-conform-to-the-likelihood-principle/#comments

But I’m a bit surprised that Gelman is prepared to accept the LP even given the model and data, especially as “reference” Bayesians are prepared to violate it. Or does Gelman not think they are really violating it? (As I say, they like to describe it as a merely “technical” violation, but I am not clear on what other kind there is supposed to be.)

I do not think that the use of a reference prior violates the LP in any way. It does not use the observed data and thus preserves the principle that the observed data should enter into the inference solely through the likelihood function.

http://andrewgelman.com/2012/10/it-not-necessary-that-bayesian-methods-conform-to-the-likelihood-principle/#comment-107804

Reference priors do not even use the likelihood function. They use the sampling distribution, which is not the same thing. The sampling distribution is a distribution, that is, a probability distribution that integrates to 1. The likelihood is an equivalence class of functions that are evaluated on the fixed, observed data, and are a function of the parameters of the problem. I don’t understand how people can confuse the two.

??? Take it up with the reference Bayesians who hold that desirable reference priors force them to consider the statistical model, “leading to violations of basic principles, such as the likelihood principle and the stopping rule principle” (Berger 2006, 394), Bernardo (2005; 2010)

(2006), “The Case for Objective Bayesian Analysis” and “Rejoinder”, Bayesian Analy- sis 1(3), 385–402; 457–464.

Bernardo, J. M. (2005), “Reference Analysis”, in: Dey, D. K. and C. R. Rao (eds.), Hand- book of Statistics 25, Amsterdam: Elsevier, 17–90.

bayesrules: The argument, if I remember it correctly, is that Bernardo’s rule to choose the reference prior depends on characteristics of the experiment that are not reflected in the likelihood function. I.e., you will use different reference priors and therefore get different inferences depending on the stopping rule in the standard stopping rule examples where likelihoods are proportional. (Note that I reconstruct this from my unreliable memory without going through Bernardo’s papers again right now.)

??? Why should it not be the case that the way the experiment is conducted should not affect the likelihood, the stopping rule, the priors, etc.?

All the LP says is that the observed data should only enter into the inference via the likelihood. That’s all. If the likelihood is affected by the stopping rule, or the priors are affected by it, so be it. I do not view this as a refutation of the LP.

bayesrules: Note: the stopping rule affects the likelihoods (the way the experiment is conducted is not affecting the stopping rule), but in the ratio, the proportionality constant cancels out. If the prior reflects degrees of evidential weight prior to the experiment, then at least subjective Bayesians question why it’s being altered by the experimental sampling rule. Remember the “argument from intentions” criticism taken up several times, but most recently: https://errorstatistics.com/2012/09/19/barnard-background-infointentions/

To consider outcomes other than the one observed, in reasoning from the observed data, violates the LP and is deemed Bayesian incoherent.

That is not how I understand the LP nor is it how I understand stopping rules. I have stated my understanding of the LP above. It says, in my understanding of it, that the observed data should enter into the inference solely via the likelihood.

It does not say anything about how the experimental design should affect the likelihood or the priors. Indeed, the experimental design SHOULD affect the likelihood, because it tells us how the experiment was conducted and therefore what the likelihood should be.

And the experimental design SHOULD affect the prior. The prior is not some etherial notion about absolutes. It is conditional on the model and on the experimental protocol.

Jaynes taught us that if you don’t include all of the background information when you devise a probability model, you will make mistakes.

In my opinion, these are mistakes of this sort.

First 2 paras fine, if you add given the statistical model, but some of the rest is idiosyncratic, at least wrt to the way these terms have been used in the debates we are considering just now. Why does the specification as to when to stop alter your prior again? If the data, e.g., recording where it stops, influences the prior, then x is not merely influencing the likelihood. The central issue has always been what should count as relevant information, and since you mention Jaynes, he clearly denied that outcomes other than the observed are relevant in reasoning from the observed. (e.g., 1976, 246, in Harper and Hooker volume). Please check defns of the LP, SLP. Consideration of stopping rules involves considering outcomes that might have occurred but did not. How and why do they influence your prior credence in parameters?