Here are the slides from my discussion of Nancy Reid today at BFF4: The Fourth Bayesian, Fiducial, and Frequentist Workshop: May 1-3, 2017 (hosted by Harvard University)

# law of likelihood

## “Fusion-Confusion?” My Discussion of Nancy Reid: “BFF Four- Are we Converging?”

## What’s wrong with taking (1 – β)/α, as a likelihood ratio comparing H0 and H1?

**Here’s a quick note on something that I often find in discussions on tests, even though it treats “power”, which is a capacity-of-test notion, as if it were a fit-with-data notion…..**

1.** Take a one-sided Normal test T+: with n iid samples:**

**H _{0}: µ ≤ _{ }0 against H_{1}: µ > _{ }0**

σ = 10, n = 100, σ/√n =σ** _{x}**= 1, α = .025.

So the test would reject H_{0} iff Z > c_{.025} =1.96. (1.96. is the “cut-off”.)

~~~~~~~~~~~~~~

**Simple rules for alternatives against which T+ has high power:**

- If we add σ
_{x }(here 1) to the cut-off (here, 1.96) we are at an alternative value for µ that test T+ has .84 power to detect. - If we add 3σ
to the cut-off we are at an alternative value for µ that test T+ has ~ .999 power to detect. This value, which we can write as µ_{x }^{.}^{999}= 4.96

Let the observed outcome just reach the cut-off to reject the null,z_{0 }= 1.96.

If we were to form a “likelihood ratio” of μ = 4.96 compared to μ_{0} = 0 using

[Power(T+, 4.96)]/α,

it would be 40. (.999/.025).

It is absurd to say the alternative 4.96 is supported 40 times as much as the null, understanding support as likelihood or comparative likelihood. (The data 1.96 are even closer to 0 than to 4.96). The same point can be made with less extreme cases.) What is commonly done next is to assign priors of .5 to the two hypotheses, yielding

Pr(H_{0} |z_{0}) = 1/ (1 + 40) = .024, so Pr(H_{1} |z_{0}) = .976.

Such an inference is highly unwarranted and would almost always be wrong. Continue reading

## How likelihoodists exaggerate evidence from statistical tests

Have you ever noticed that some leading advocates of a statistical account, say a testing account A, upon discovering account A is unable to handle a certain kind of important testing problem that a rival testing account, account B, has no trouble at all with, will mount an argument that being able to handle that kind of problem is actually a bad thing? In fact, they might argue that testing account B is not a “real” testing account because it *can* handle such a problem? You have? Sure you have, if you read this blog. But that’s only a subliminal point of this post.

I’ve had three posts recently on the Law of Likelihood (LL): Breaking the [LL](a)(b), [c], and [LL] is bankrupt. Please read at least one of them for background. All deal with Royall’s comparative likelihoodist account, which some will say only a few people even use, but I promise you that these same points come up again and again in foundational criticisms from entirely other quarters.[i]

[M]edical researchers are interested in the success probability, θ, associated with a new treatment. They are particularly interested in how θ relates to the old treatment’s success probability, believed to be about 0.2. They have reason to hope θ is considerably greater, perhaps 0.8 or even greater. To obtain evidence about θ, they carry out a study in which the new treatment is given to 17 subjects, and find that it is successful in nine.

Let me interject at this point that of all of Stephen Senn’s posts on this blog, my favorite is the one where he zeroes in on the proper way to think about the discrepancy we hope to find (the .8 in this example). (See note [ii]) Continue reading

## Why the Law of Likelihood is bankrupt–as an account of evidence

There was a session at the Philosophy of Science Association meeting last week where two of the speakers, Greg Gandenberger and Jiji Zhang had insightful things to say about the “Law of Likelihood” (LL)[i]. Recall from recent posts here and here that the (LL) regards data * x* as evidence supporting

*H*over

_{1}*H*iff

_{0 }Pr(*x;** H _{1}*) > Pr(

*x;**H*).

_{0}On many accounts, the likelihood ratio also measures the strength of that comparative evidence. (Royall 1997, p.3). [ii]

*H _{0 }*and

*H*are statistical hypothesis that assign probabilities to the random variable

_{1 }*taking value*

**X***.*

**x***As I recall, the speakers limited*

_{ }*H*and

_{1}*H*to simple statistical hypotheses (as Richard Royall generally does)–already restricting the account to rather artificial cases, but I put that to one side. Remember, with likelihoods, the data

_{0 }*are fixed, the hypotheses vary.*

**x**** 1. Maximally likely alternatives.** I didn’t really disagree with anything the speakers said. I welcomed their recognition that a central problem facing the (LL) is the ease of constructing maximally likely alternatives: so long as Pr(

*x;**H*) < 1, a maximum likely alternative

_{0}*H*would be evidentially “favored”. There is no onus on the likelihoodist to predesignate the rival, you are free to search, hunt, post-designate and construct a best (or better) fitting rival. If you’re bothered by this, says Royall, then this just means the evidence disagrees with your prior beliefs.

_{1}After all, Royall famously distinguishes between evidence and belief (recall the evidence-belief-action distinction), and these problematic cases, he thinks, do not vitiate his account as an account of *evidence*. But I think they do! In fact, I think they render the (LL) utterly bankrupt *as an account of evidence*. Here are a few reasons. (Let me be clear that I am not pinning Royall’s defense on the speakers[iii], so much as saying it came up in the general discussion[iv].) Continue reading

## BREAKING THE (Royall) LAW! (of likelihood) (C)

With this post, I finally get back to the promised sequel to “Breaking the Law! (of likelihood) (A) and (B)” from a few weeks ago. You might wish to read that one first.*** **A relevant paper by Royall is here.

Richard Royall is a statistician^{1} who has had a deep impact on recent philosophy of statistics by giving a neat proposal that appears to settle disagreements about statistical philosophy! He distinguishes three questions:

**What should I believe?****How should I act?****Is this data evidence of some claim? (or How should I interpret this body of observations as evidence?)**

It all sounds quite sensible– *at first–*and, impressively, many statisticians and philosophers of different persuasions have bought into it. At least they appear willing to go this far with him on the 3 questions.

How is each question to be answered? According to Royall’s ~~commandments~~ writings, what to believe is captured by Bayesian posteriors; how to act, by a behavioristic, N-P long-run performance. And what method answers the evidential question? A comparative likelihood approach. You may want to reject all of them (as I do),^{2} but just focus on the last.

Remember with likelihoods, the data * x* are fixed, the hypotheses vary. A great many critical discussions of frequentist error statistical inference (significance tests, confidence intervals, p- values, power, etc.) start with “the law”. But I fail to see why we should obey it.

To begin with, a report of comparative likelihoods isn’t very useful: *H* might be less likely than *H*’, given * x*, but so what? What do I do with that information? It doesn’t tell me I have evidence against or for either.

^{3}Recall, as well, Hacking’s points here about the variability in the meanings of a likelihood ratio across problems. Continue reading

## BREAKING THE LAW! (of likelihood): to keep their fit measures in line (A), (B 2nd)

**1.An Assumed Law of Statistical Evidence (law of likelihood)**

Nearly all critical discussions of frequentist error statistical inference (significance tests, confidence intervals, p- values, power, etc.) start with the following general assumption about the nature of inductive evidence or support:

Data ** x** are better evidence for hypothesis

*H*than for

_{1}*H*if

_{0}

*x**are*more probable under

*H*than under

_{1}*H*.

_{0}Ian Hacking (1965) called this the * logic of support: x* supports hypotheses

*H*

_{1}more than

*H*

_{0}if

*H*is more

_{1}**likely**, given

**than is**

*x**H0*:

Pr(*x;** H _{1}*) > Pr(

*x;**H*).

_{0}[With likelihoods, the data ** x** are fixed, the hypotheses vary.]*

Or,

** x** is evidence for

*H*over

_{1}*H*if the

_{0 }*(*

**likelihood ratio****LR***H*over

_{1}*H*) is greater than 1.

_{0 }It is given in other ways besides, but it’s the same general idea. (Some will take the LR as actually quantifying the support, others leave it qualitative.)

In terms of rejection:

“An hypothesis should be rejected if and only if there is some rival hypothesis much better supported [i.e., much more likely] than it is.” (Hacking 1965, 89)

**2. Barnard (British Journal of Philosophy of Science )**

But this “law” will immediately be seen to fail on our minimal *severity requirement*. Hunting for an impressive fit, or trying and trying again, it’s easy to find a rival hypothesis *H _{1}* much better “supported” than

*H*even when

_{0 }*H*is true. Or, as Barnard (1972) puts it, “there always is such a rival hypothesis, viz. that things just had to turn out the way they actually did” (1972 p. 129).

_{0}*H*: the coin is fair, gets a small likelihood (.5)

_{0}^{k}given k tosses of a coin, while

*H*: the probability of heads is 1 just on those tosses that yield a head, renders the sequence of k outcomes maximally likely. This is an example of Barnard’s “things just had to turn out as they did”. Or, to use an example with P-values: a statistically significant difference, being improbable under the null

_{1}*H*, will afford high likelihood to any number of explanations that fit the data well.

_{0}**3.Breaking the law (of likelihood) by going to the “second,” error statistical level:**

How does it fail our severity requirement? First look at what the frequentist error statistician must always do to critique an inference: she must consider the capability of the inference method that *purports* to provide evidence for a claim. She goes to a higher level or metalevel, as it were. In this case, the likelihood ratio plays the role of the needed statistic *d*(** X**). To put it informally, she asks:

What’s the probability the method would yield an LR disfavoring

Hcompared to some alternative_{0}Heven if_{1}His true?_{0 }