“Model Verification and the Likelihood Principle” by Samuel C. Fletcher
Department of Logic & Philosophy of Science (PhD Student)
University of California, Irvine
I’d like to sketch an idea concerning the applicability of the Likelihood Principle (LP) to non-trivial statistical problems. What I mean by “non-trivial statistical problems” are those involving substantive modeling assumptions, where there could be any doubt that the probability model faithfully represents the mechanism generating the data. (Understanding exactly how scientific models represent phenomena is subtle and important, but it will not be my focus here. For more, see http://plato.stanford.edu/entries/models-science/.) In such cases, it is crucial for the modeler to verify, inasmuch as it is possible, the sufficient faithfulness of those assumptions.
But the techniques used to verify these statistical assumptions are themselves statistical. One can then ask: do techniques of model verification fall under the purview of the LP? That is: are such techniques a part of the inferential procedure constrained by the LP? I will argue the following:
(1) If they are—what I’ll call the inferential view of model verification—then there will be in general no inferential procedures that satisfy the LP.
(2) If they are not—what I’ll call the non-inferential view—then there are aspects of any evidential evaluation that inferential techniques bound by the LP do not capture.
If (1) and (2) hold, then it follows that the LP cannot be a constraint on any measure of evidence, for either no such measure can satisfy it by (1), or measures that do satisfy it cannot capture essential aspects of evidential bearing by (2). I want to emphasize that I am not arguing for either the inferential or non-inferential view of model verification. (Indeed, I suspect that whether one seems more plausible will be contextual.) Instead I want to point out that, whatever one’s views about the role of model verification in inference, the LP cannot, as many commentators have assumed, constrain the inferential procedures used in non-trivial statistical problems. In the remainder, I will flesh out some arguments for (1) and (2).
First, what does it mean for techniques of model verification to be a part of the inferential procedure constrained by the LP, as the inferential view holds? One way to understand such a view is to represent the probability model as an enormous mixture model , where the index β of the indicator function I labels all conceivable models one might use for a given statistical problem. Procedures of model verification, then, are inferential in the sense that they select some α in the same way as other procedures of statistical inference select elements from the parameter space of .
In general, this huge mixture will be vaguely defined, so it is hard to see how one could apply the LP to it. But even if one could, there would be no general techniques for model verification that that conform to it. Essentially, the reason is because all such techniques seem to require the Fisherian logic of testing: one makes an assumption, from which it follows that certain statistics follow certain sampling distributions which one constructs theoretically or estimates through simulation. To the degree that the data are improbable, one then has reason to reject said assumption. Because it is well known that inferential procedures depending on sampling distributions do not in general satisfy the LP, the same follows for these techniques.
Now, such tools are available in a Bayesian context (e.g., Ch. 6 of Bayesian Data Analysis (2004), by Gelman et al.), but they too use sampling distributions. Other methods commonly used in Bayesian model checking are essentially comparative, so while they may be useful in their own respects, they cannot suffice to check assumptions generally. For example, the fact that the Bayes factor for two models—the ratio of their marginal likelihoods—favors one over the other by 1,000 doesn’t say anything about whether the favored model could have plausibly been generated the data. In other words, because comparative methods must work within a circumscribed class of statistical models, they cannot evaluate the statistical adequacy of that class itself.
What, then, is the non-inferential view of model verification? One way to understand this view is that it divides inferences for the primary parameters of interest from those used to test model assumptions. To a first approximation, this view structures statistical analysis as a two-step process, the first of which involves techniques of model verification to select a sufficiently statistically adequate model. The second step, in which the model is then subjected to inferential procedures for the parameters of scientific interest, is the only one for which the LP applies. (In practice, these steps may be repeated to sequentially test different assumptions, but I will leave that complication aside.)
But if techniques of model verification are bracketed from the inferential procedures constrained by the LP, then those procedures cannot take into account the outcomes of the former in assessing the evidence. Call the outcomes of techniques of model verification assessments of reliability of the model to which they are applied. Then measures of evidence that adhere to the LP cannot distinguish between two statistical models that have proportional likelihoods for a given set of data but are not equally reliable.
I take it to be uncontroversial that statisticians concerned with non-trivial statistical problems should care about the outcomes of their model checks—that is, I take it that they should be concerned with the reliability of their modeling assumptions. But any measure of evidence that satisfies the LP cannot take into account this sense of reliability, because the non-inferential conception of model verification brackets information about reliability from the assessment of evidence.
So under either the inferential or non-inferential conceptions of model verification, there are difficulties applying the LP. Under the non-inferential conception, however, proponents of the LP may hold out for a restricted version thereof, one that does not bind any notion of evidence whatsoever but instead does so for a more specialized and circumscribed notion. Perhaps this would go some way towards illuminating the controversial nature of the LP.
“Remarks on the Likelihood Principle” by Nicole Jinn
Department of Philosophy (MA student)
The general issue as to whether the Likelihood Principle[*] directs against using sampling distributions for model validation is not yet settled. For this reason, I would like to make a few remarks that I hope would aid in clarifying the Likelihood Principle.
Statistical adequacy and model checking
First, there has been a lot of confusion on what it means to adhere to the Likelihood Principle. To shed light on this confusion, consider one of Samuel Fletcher’s statements about restrictiveness: “… even if there were an instance where the Likelihood Principle could apply, the only available techniques for model verification are classical – that is, effectively based on sampling distributions – techniques that, in principle, do not satisfy the Likelihood Principle”(Fletcher 2012a, 8). Failure to satisfy the likelihood principle, i.e., violating the Likelihood Principle, can be roughly understood as considering outcomes other than the one observed. The Likelihood Principle is violated, rightly if one cares about error probabilities.
The trouble is, if we have problems with methods that have statistical assumptions, as (Fletcher 2012a) does, we would have problems with all of them — we could not use significance tests or any methods for parameter estimation, because they all depend on the adequacy of the model. It’s very important to emphasize that statistical adequacy of a model requires only that the computed error probabilities from the model be approximately equal to the actual ones in using the appropriate statistical methodology. However, Fletcher asks what “actual error probabilities” are and whether we have (epistemological) access to them in (Fletcher 2012b). The context I had in mind when defining statistical adequacy is: “approximate” is being juxtaposed with “exact”, meaning even if one is not in a position to be able to calculate the error probabilities, one can attempt to approximate what they would be. An example of a non-approximate error probability can be found in (Mayo 2012, sec. 6.2)
Admittedly, even some textbooks are confused about the point at issue. Leading statisticians George Casella and Roger Berger state, “Most data analysts perform some sort of ‘model checking’ when analyzing a set of data. For example, it is common practice to examine residuals from a model … such a practice directly violates the Likelihood Principle”(Berger and Casella 2002, 295–296). Professor Deborah Mayo comments on this passage in an earlier blog post.
A better way to say what Casella and Berger mean is that the Likelihood Principle is inapplicable if you don’t know the underlying model. Fletcher even acknowledges that we must be comfortable with the model before considering the Likelihood Principle (Fletcher 2012b).
Reliability and using error probabilities
Second, a notion of reliability seems to violate the Likelihood Principle. “[W]e want our experiments to be reliable so that we can trust the evidence they produce. If this is one such reason, though, it does not make sense for one to care about the reliability of an experimental design but maintain that this reliability has no bearing on the evidence the experiment produces”(Fletcher 2012a, 5). Fletcher thinks, as error statisticians strongly advocate, that appraising evidence cannot occur without considering error probabilities, which requires considering the sampling distribution, which violates the Likelihood Principle. By contrast, the Likelihood Principle tells us that the sampling distribution is irrelevant to inference once the data are known. On the other hand, Fletcher in (Fletcher 2012b) warns that there may be more than one related notion of evidence, which adverts to the question of whether appraising evidence means assessing reliability of a statistical method. Put in another way, there does not seem to be a lucid notion of reliable evidence in terms of adherence (or not) to the Likelihood Principle.
Nonetheless, those who embrace the Likelihood Principle, such as Bayesians, still allow that other elements, e.g., priors and costs are needed for a full inference or decision. Their position is that the evidential import of the data is through the likelihood. A respectable number of researchers allow that the Likelihood Principle only goes as far as the information from the data within the model for the experiment, and therefore differences could occur in priors and utilities. Though, why would there be differences in priors if the hypothesis is the same? The Likelihood Principle applies in the context of two distinct models, yet the same inference is made in both models in terms of a common set of unknown parameter(s).
Furthermore, the notion of a “relevant difference in utilities or prior probabilities” (Gandenberger) is dubious because a unified theory of relevance still does not exist, in the sense of (Seidenfeld 1979, 219). After all, a sufficient statistic supposedly captures as much of the relevant information from the original data, as possible. But exactly how do we make sense of what counts as relevant information? To shed light on establishing methodological variants of the sufficiency and conditionality principles, (Fisher 1922) might help to (re)consider the purpose of statistical methods. I couldn’t agree with Fletcher more in advocating using error probabilities and that is really the ground for rejecting the (Evidential) Likelihood Principle. That is the ground that needs to be emphasized.
Strictly speaking, those who accept any version of the Likelihood Principle could allow model checking that uses methods that employ sampling distributions and error probabilities. Maybe they’re schizophrenic, but they do it. Fortunately, the (Evidential) Likelihood Principle does not follow from the principles thought to entail it, as has been demonstrated recently in (Mayo 2010 and the appendix to Mayo and Cox 2011).
Berger, R. L., and G. Casella. 2002. Statistical Inference. Second. Duxbury Press.
Fisher, R. A. 1922. “On the Mathematical Foundations of Theoretical Statistics.” Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character 222 (594-604) (January 1): 309–368. doi:10.1098/rsta.1922.0009. http://rsta.royalsocietypublishing.org/content/222/594-604/309.
Fletcher, Samuel. 2012a. “Design and Verify”. Unpublished discussion. Virginia Tech Graduate Philosophy of Science Conference.
———. 2012b. “Design and Verify”. Presentation slides. Virginia Tech Graduate Philosophy of Science Conference.
Mayo, Deborah G. 2010. ”An Error in the Argument from Conditionality and Sufficiency to the Likelihood Principle” in Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science (D Mayo and A. Spanos eds.), Cambridge: Cambridge University Press: 305-14.
Mayo, D. G. and Cox, D. R. (2011) “Statistical Scientist Meets a Philosopher of Science: A Conversation with Sir David Cox.” Rationality, Markets and Morals (RMM), 2, Special Topic: Statistical Science and Philosophy of Science, 103-114
———. 2012. “Statistical Science Meets Philosophy of Science Part 2: Shallow Versus Deep Explorations.” Rationality, Markets and Morals: Studies at the Intersection of Philosophy and Economics 3 (Special Topic: Statistical Science and Philosophy of Science) (September 26): 71–107. http://www.rmm-journal.com/downloads/Article_Mayo2.pdf.
Seidenfeld, T. 1979. Philosophical Problems of Statistical Inference: Learning from R.A. Fisher. Springer.
[*] Likelihood Principle (LP): For any two experiments E’ and E” with different probability models f’, f’’ but with the same unknown parameter θ, if the likelihood of outcomes x’* and x”* (from E’ and E” respectively) are proportional to each other, then x’* and x”* should have the identical evidential import for any inference concerning parameter θ.
Background to these U-Phils may be found here.
An earlier exchange between Fletcher and Jinn took place at the Virginia Tech Philosophy graduate student conference, fall 2012.