With this post, I finally get back to the promised sequel to “Breaking the Law! (of likelihood) (A) and (B)” from a few weeks ago. You might wish to read that one first.* A relevant paper by Royall is here.
Richard Royall is a statistician1 who has had a deep impact on recent philosophy of statistics by giving a neat proposal that appears to settle disagreements about statistical philosophy! He distinguishes three questions:
- What should I believe?
- How should I act?
- Is this data evidence of some claim? (or How should I interpret this body of observations as evidence?)
It all sounds quite sensible– at first–and, impressively, many statisticians and philosophers of different persuasions have bought into it. At least they appear willing to go this far with him on the 3 questions.
How is each question to be answered? According to Royall’s
commandments writings, what to believe is captured by Bayesian posteriors; how to act, by a behavioristic, N-P long-run performance. And what method answers the evidential question? A comparative likelihood approach. You may want to reject all of them (as I do),2 but just focus on the last.
Remember with likelihoods, the data x are fixed, the hypotheses vary. A great many critical discussions of frequentist error statistical inference (significance tests, confidence intervals, p- values, power, etc.) start with “the law”. But I fail to see why we should obey it.
To begin with, a report of comparative likelihoods isn’t very useful: H might be less likely than H’, given x, but so what? What do I do with that information? It doesn’t tell me I have evidence against or for either.3 Recall, as well, Hacking’s points here about the variability in the meanings of a likelihood ratio across problems.
Royall: “the likelihood view is that observations [like x and y]…have no valid interpretation as evidence in relation to the single hypothesis H.” (2004, p. 149). In his view, all attempts to say whether x is good evidence for H or even if x is better evidence for H than is y are utterly futile. Only comparing a fixed x to H versus some alternative H’ can work, according to Royall’s likelihoodist.
Which alternative to use in the comparison? Should it be a specific alternative? A vague catchall hypothesis? (See Barnard post.) A maximally likely alternative? An alternative against which a test has high power? The answer differs greatly based on the choice. Moreover, an account restricted to comparisons cannot answer our fundamental question: is x good evidence for H or is it a case of BENT evidence (bad evidence no test)? His likelihood account obeys the Likelihood Principle (LP) or, as he puts it, the “irrelevance of the sample space”. That means ignoring the impact of stopping rules on error probabilities. A 2 s.d. difference from “trying and trying again” (using the two-sided Normal tests in the links) or a fixed sample size registers exactly the same, because the likelihoods are proportional. (On stopping rules, see this post, Mayo and Kruse (2001), EGEK (1996, chapter 10); on the LP see Mayo 2014, and search this blog for quite a lot under SLP).
When I challenged Royall with the optional stopping case at the ecology conference (that gave rise to the Taper and Lele volume), he looked surprised at first, and responded (in a booming voice): “But it’s a law!” (My contribution to the Taper and Lele volume is here.) Philosopher Roger Rosenkrantz remarks:
“The likelihood principle implies…the irrelevance of predesignation, of whether an hypothesis was thought of beforehand or was introduced to explain known effects.” (Rosenkrantz, p. 122)
[What a blissful life these likelihoodists live, in the face of today’s data plundering.]
Nor does Royall object to the point Barnard made in criticizing Hacking when he was a likelihoodist:
Turning over the top card of a shuffled deck of playing cards, I find an ace of diamonds:
“According to the law of likelihood, the hypothesis that the deck consists of 52 aces of diamonds (H1) is better supported than the hypothesis that the deck is normal (HN) [by the factor 52]…Some find this disturbing.”
But not Royall.
“Furthermore, it seems unfair; no matter what card is drawn, the law implies that the corresponding trick-deck hypothesis (52 cards just like the one drawn) is better supported than the normal-deck hypothesis. Thus even if the deck is normal we will always claim to have found strong evidence that it is not.”
To Royall, it only shows a confusion between evidence and belief. If you’re not convinced the deck has 52 aces of diamonds “it does not mean that the observation is not strong evidence in favor of H1 versus HN.” It just wasn’t strong enough to overcome your prior beliefs. Now Royall is no Bayesian, at least he doesn’t think a Bayesian computation gives us answers about evidence. (Actually, he alludes to this as a frequentist attempt, at least in Taper and Lele). In his view, evidence comes solely from these (deductively given) comparative likelihoods (1997, 14). (I don’t know if he ever discusses model checking.) An appeal to beliefs enters only to explain any disagreements with his “law”.
Consider Royall’s treatment of the familiar example where a positive diagnostic result is more probable under “disease” than “no disease”. Then, even if a low prior probability for disease is sufficiently small to result in a low posterior for disease “to interpret the positive test result as evidence that the subject does not have the disease is never appropriate––it is simply and unequivocally wrong. Why is it wrong?” (2004, 122).
Well you already know the answer: it violates “the law”.
“[I]t violates the fundamental principle of statistical reasoning. That principle, the basic rule for interpreting statistical evidence, is what Hacking (1965, 70) named the law of likelihood. It states:
If hypothesis A implies that the probability that a random variable X takes the value x is pA(x), while hypothesis B implies that the probability is pB(x), then the observation X = x is evidence supporting A over B if and only if pA(x) > pB(x), and the likelihood ratio, pA(x)/ pB(x), measures the strength of that evidence.” (Royall, 2004, p. 122)
“This says simply that if an event is more probable under hypothesis A than hypothesis B, then the occurrence of that event is evidence supporting A over B––the hypothesis that did the better job of predicting the event is better supported by its occurrence.” Moreover, “the likelihood ratio, is the exact factor by which the probability ratio [ratio of priors in A and B] is changed”. (ibid. 123)
There are basically two ways to supplement comparative likelihoods: introduce other possible hypotheses (e.g., prior probabilities) or other possible outcomes (e.g., sampling distributions, error probabilities).
*Like everyone else, I’m incredibly pressed at the moment. It’s either unpolished blog posts, or no posts. So please send corrections. If I update this, I’ll mark it as part (C, 2nd).
1 Royall, retired from Johns Hopkins, now serves as Chairman of Advisory Board of Analytical Edge Inc. “Prof. Royall is internationally recognized as the father of modern Likelihood methodology, having largely formulated its foundation and demonstrating its viability for representing, interpreting and communicating statistical evidence via the likelihood function.” (Link is here).
[Incidentally, I always attempt to contact people I post on; but the last time I tried to contact Royall, I didn’t succeed.]
2 I consider that the proper way to answer questions of evidence is by means of an error statistical account used to assess and control the severity of tests. Comparative likelihoodist accounts fail to provide this.
3 Do not confuse an account’s having a rival–which certainly N-P tests and CIs do–with the account being merely comparative. With the latter, you do not detach an inference, it’s always on the order of x “favors” H over H’ or the like. And remember, that’s ALL statistical evidence is in this account.
Mayo, D. G. (2004). “An Error-Statistical Philosophy of Evidence,” 79-118, in M. Taper and S. Lele (eds.) The Nature of Scientific Evidence: Statistical, Philosophical and Empirical Considerations. Chicago: University of Chicago Press.
Mayo, D. G. (2014) On the Birnbaum Argument for the Strong Likelihood Principle (with discussion & rejoinder). Statistical Science 29, no. 2, 227-266.
Mayo, D. G. and Kruse, M. (2001). “Principles of Inference and Their Consequences,” 381-403, in D. Cornfield and J. Williamson (eds.) Foundations of Bayesianism. Dordrecht: Kluwer.
Rosenkrantz, R. (1977) Inference, Method and Decision. Dordrecht: D. Reidel.
Royall, R. (1997) Statistical Evidence: A likelihood paradigm, Chapman and Hall, CRC Press.
Royall, R. (2004), “The Likelihood Paradigm for Statistical Evidence” 119-138; Rejoinder 145-151, in M. Taper, and S. Lele (eds.) The Nature of Scientific Evidence: Statistical, Philosophical and Empirical Considerations. Chicago: University of Chicago Press.