law of likelihood

A Perfect Time to Binge Read the (Strong) Likelihood Principle

Posted on December 30, 2020 by Mayo

An essential component of inference based on familiar frequentist notions: p-values, significance and confidence levels, is the relevant sampling distribution (hence the term sampling theory, or my preferred error statistics, as we get error probabilities from the sampling distribution). This feature results in violations of a principle known as the strong likelihood principle (SLP). To state the SLP roughly, it asserts that all the evidential import in the data (for parametric inference within a model) resides in the likelihoods. If accepted, it would render error probabilities irrelevant post data. Continue reading →

Categories: Birnbaum, Birnbaum Brakes, law of likelihood | 3 Comments

A Perfect Time to Binge Read the (Strong) Likelihood Principle

Posted on December 30, 2019 by Mayo

Categories: Birnbaum, Birnbaum Brakes, law of likelihood | 8 Comments

The First Eye-Opener: Error Probing Tools vs Logics of Evidence (Excursion 1 Tour II)

Posted on October 21, 2019 by Mayo

1.4, 1.5

In Tour II of this first Excursion of Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (SIST, 2018, CUP), I pull back the cover on disagreements between experts charged with restoring integrity to today’s statistical practice. Some advised me to wait until later (in the book) to get to this eye-opener. Granted, the full story involves some technical issues, but after many months, I think I arrived at a way to get to the heart of things informally (with a promise of more detailed retracing of steps later on). It was too important not to reveal right away that some of the most popular “reforms” fall down on the job even with respect to our most minimal principle of evidence (you don’t have evidence for a claim if little if anything has been done to probe the ways it can be flawed). Continue reading →

Categories: Error Statistics, law of likelihood, SIST | 14 Comments

Excursion 1 Tour II: Error Probing Tools versus Logics of Evidence-Excerpt

Posted on April 4, 2019 by Mayo

For the first time, I’m excerpting all of Excursion 1 Tour II from SIST (2018, CUP).

1.4 The Law of Likelihood and Error Statistics

If you want to understand what’s true about statistical inference, you should begin with what has long been a holy grail–to use probability to arrive at a type of logic of evidential support–and in the first instance you should look not at full-blown Bayesian probabilism, but at comparative accounts that sidestep prior probabilities in hypotheses. An intuitively plausible logic of comparative support was given by the philosopher Ian Hacking (1965)–the Law of Likelihood. Fortunately, the Museum of Statistics is organized by theme, and the Law of Likelihood and the related Likelihood Principle is a big one. Continue reading →

Categories: Error Statistics, law of likelihood, SIST | 2 Comments

“Fusion-Confusion?” My Discussion of Nancy Reid: “BFF Four- Are we Converging?”

Posted on May 2, 2017 by Mayo

Here are the slides from my discussion of Nancy Reid today at BFF4: The Fourth Bayesian, Fiducial, and Frequentist Workshop: May 1-3, 2017 (hosted by Harvard University)

Categories: Bayesian/frequentist, C.S. Peirce, confirmation theory, fiducial probability, Fisher, law of likelihood, Popper | Tags: Hacking | 1 Comment

What’s wrong with taking (1 – β)/α, as a likelihood ratio comparing H0 and H1?

Posted on February 10, 2015 by Mayo

Here’s a quick note on something that I often find in discussions on tests, even though it treats “power”, which is a capacity-of-test notion, as if it were a fit-with-data notion…..

1. Take a one-sided Normal test T+: with n iid samples:

H₀: µ ≤ 0 against H₁: µ > 0

σ = 10, n = 100, σ/√n =σ_x= 1, α = .025.

So the test would reject H₀ iff Z > c_.025 =1.96. (1.96. is the “cut-off”.)

~~~~~~~~~~~~~~

Simple rules for alternatives against which T+ has high power:

If we add σ_x(here 1) to the cut-off (here, 1.96) we are at an alternative value for µ that test T+ has .84 power to detect.
If we add 3σ_xto the cut-off we are at an alternative value for µ that test T+ has ~ .999 power to detect. This value, which we can write as µ^.⁹⁹⁹ = 4.96

Let the observed outcome just reach the cut-off to reject the null,z₀= 1.96.

If we were to form a “likelihood ratio” of μ = 4.96 compared to μ₀ = 0 using

[Power(T+, 4.96)]/α,

it would be 40. (.999/.025).

It is absurd to say the alternative 4.96 is supported 40 times as much as the null, understanding support as likelihood or comparative likelihood. (The data 1.96 are even closer to 0 than to 4.96). The same point can be made with less extreme cases.) What is commonly done next is to assign priors of .5 to the two hypotheses, yielding

Pr(H₀ |z₀) = 1/ (1 + 40) = .024, so Pr(H₁ |z₀) = .976.

Such an inference is highly unwarranted and would almost always be wrong. Continue reading →

Categories: Bayesian/frequentist, law of likelihood, Statistical power, statistical tests, Statistics, Stephen Senn | 87 Comments

How likelihoodists exaggerate evidence from statistical tests

Posted on November 25, 2014 by Mayo

I insist on point against point, no matter how much it hurts

Have you ever noticed that some leading advocates of a statistical account, say a testing account A, upon discovering account A is unable to handle a certain kind of important testing problem that a rival testing account, account B, has no trouble at all with, will mount an argument that being able to handle that kind of problem is actually a bad thing? In fact, they might argue that testing account B is not a “real” testing account because it can handle such a problem? You have? Sure you have, if you read this blog. But that’s only a subliminal point of this post.

I’ve had three posts recently on the Law of Likelihood (LL): Breaking the [LL](a)(b), [c], and [LL] is bankrupt. Please read at least one of them for background. All deal with Royall’s comparative likelihoodist account, which some will say only a few people even use, but I promise you that these same points come up again and again in foundational criticisms from entirely other quarters.[i]

An example from Royall is typical: He makes it clear that an account based on the (LL) is unable to handle composite tests, even simple one-sided tests for which account B supplies uniformly most powerful (UMP) tests. He concludes, not that his test comes up short, but that any genuine test or ‘rule of rejection’ must have a point alternative! Here’s the case (Royall, 1997, pp. 19-20):

[M]edical researchers are interested in the success probability, θ, associated with a new treatment. They are particularly interested in how θ relates to the old treatment’s success probability, believed to be about 0.2. They have reason to hope θ is considerably greater, perhaps 0.8 or even greater. To obtain evidence about θ, they carry out a study in which the new treatment is given to 17 subjects, and find that it is successful in nine.

Let me interject at this point that of all of Stephen Senn’s posts on this blog, my favorite is the one where he zeroes in on the proper way to think about the discrepancy we hope to find (the .8 in this example). (See note [ii]) Continue reading →

Categories: law of likelihood, Richard Royall, Statistics | Tags: Sober | 18 Comments

Why the Law of Likelihood is bankrupt–as an account of evidence

Posted on November 15, 2014 by Mayo

There was a session at the Philosophy of Science Association meeting last week where two of the speakers, Greg Gandenberger and Jiji Zhang had insightful things to say about the “Law of Likelihood” (LL)[i]. Recall from recent posts here and here that the (LL) regards data x as evidence supporting H₁ over H₀ iff

Pr(x; H₁) > Pr(x; H₀).

On many accounts, the likelihood ratio also measures the strength of that comparative evidence. (Royall 1997, p.3). [ii]

H₀and H₁are statistical hypothesis that assign probabilities to the random variable X taking value x.As I recall, the speakers limited H₁ and H₀ to simple statistical hypotheses (as Richard Royall generally does)–already restricting the account to rather artificial cases, but I put that to one side. Remember, with likelihoods, the data x are fixed, the hypotheses vary.

1. Maximally likely alternatives. I didn’t really disagree with anything the speakers said. I welcomed their recognition that a central problem facing the (LL) is the ease of constructing maximally likely alternatives: so long as Pr(x; H₀) < 1, a maximum likely alternative H₁ would be evidentially “favored”. There is no onus on the likelihoodist to predesignate the rival, you are free to search, hunt, post-designate and construct a best (or better) fitting rival. If you’re bothered by this, says Royall, then this just means the evidence disagrees with your prior beliefs.

After all, Royall famously distinguishes between evidence and belief (recall the evidence-belief-action distinction), and these problematic cases, he thinks, do not vitiate his account as an account of evidence. But I think they do! In fact, I think they render the (LL) utterly bankrupt as an account of evidence. Here are a few reasons. (Let me be clear that I am not pinning Royall’s defense on the speakers[iii], so much as saying it came up in the general discussion[iv].) Continue reading →

Categories: highly probable vs highly probed, law of likelihood, Richard Royall, Statistics | 63 Comments

BREAKING THE (Royall) LAW! (of likelihood) (C)

Posted on October 10, 2014 by Mayo

With this post, I finally get back to the promised sequel to “Breaking the Law! (of likelihood) (A) and (B)” from a few weeks ago. You might wish to read that one first.* A relevant paper by Royall is here.

Richard Royall is a statistician¹ who has had a deep impact on recent philosophy of statistics by giving a neat proposal that appears to settle disagreements about statistical philosophy! He distinguishes three questions:

What should I believe?
How should I act?
Is this data evidence of some claim? (or How should I interpret this body of observations as evidence?)

It all sounds quite sensible– at first–and, impressively, many statisticians and philosophers of different persuasions have bought into it. At least they appear willing to go this far with him on the 3 questions.

How is each question to be answered? According to Royall’s ~~commandments~~ writings, what to believe is captured by Bayesian posteriors; how to act, by a behavioristic, N-P long-run performance. And what method answers the evidential question? A comparative likelihood approach. You may want to reject all of them (as I do),² but just focus on the last.

Remember with likelihoods, the data x are fixed, the hypotheses vary. A great many critical discussions of frequentist error statistical inference (significance tests, confidence intervals, p- values, power, etc.) start with “the law”. But I fail to see why we should obey it.

To begin with, a report of comparative likelihoods isn’t very useful: H might be less likely than H’, given x, but so what? What do I do with that information? It doesn’t tell me I have evidence against or for either.³ Recall, as well, Hacking’s points here about the variability in the meanings of a likelihood ratio across problems. Continue reading →

Categories: law of likelihood, Richard Royall, Statistics | 41 Comments

BREAKING THE LAW! (of likelihood): to keep their fit measures in line (A), (B 2nd)

Posted on August 29, 2014 by Mayo

1.An Assumed Law of Statistical Evidence (law of likelihood)

Nearly all critical discussions of frequentist error statistical inference (significance tests, confidence intervals, p- values, power, etc.) start with the following general assumption about the nature of inductive evidence or support:

Data x are better evidence for hypothesis H₁ than for H₀ if x are more probable under H₁ than under H₀.

Ian Hacking (1965) called this the logic of support: x supports hypotheses H₁ more than H₀ if H₁ is more likely, given x than is H0:

Pr(x; H₁) > Pr(x; H₀).

[With likelihoods, the data x are fixed, the hypotheses vary.]*

Or,

x is evidence for H₁ over H₀if the likelihood ratio LR (H₁ over H₀) is greater than 1.

It is given in other ways besides, but it’s the same general idea. (Some will take the LR as actually quantifying the support, others leave it qualitative.)

In terms of rejection:

“An hypothesis should be rejected if and only if there is some rival hypothesis much better supported [i.e., much more likely] than it is.” (Hacking 1965, 89)

2. Barnard (British Journal of Philosophy of Science )

But this “law” will immediately be seen to fail on our minimal severity requirement. Hunting for an impressive fit, or trying and trying again, it’s easy to find a rival hypothesis H₁ much better “supported” than H₀even when H₀ is true. Or, as Barnard (1972) puts it, “there always is such a rival hypothesis, viz. that things just had to turn out the way they actually did” (1972 p. 129). H₀: the coin is fair, gets a small likelihood (.5)^k given k tosses of a coin, while H₁: the probability of heads is 1 just on those tosses that yield a head, renders the sequence of k outcomes maximally likely. This is an example of Barnard’s “things just had to turn out as they did”. Or, to use an example with P-values: a statistically significant difference, being improbable under the null H₀ , will afford high likelihood to any number of explanations that fit the data well.

3.Breaking the law (of likelihood) by going to the “second,” error statistical level:

How does it fail our severity requirement? First look at what the frequentist error statistician must always do to critique an inference: she must consider the capability of the inference method that purports to provide evidence for a claim. She goes to a higher level or metalevel, as it were. In this case, the likelihood ratio plays the role of the needed statistic d(X). To put it informally, she asks:

What’s the probability the method would yield an LR disfavoring H₀ compared to some alternative H₁ even if H₀is true?

Continue reading →

Categories: highly probable vs highly probed, law of likelihood, Likelihood Principle, Statistics | 72 Comments

law of likelihood

A Perfect Time to Binge Read the (Strong) Likelihood Principle

A Perfect Time to Binge Read the (Strong) Likelihood Principle

The First Eye-Opener: Error Probing Tools vs Logics of Evidence (Excursion 1 Tour II)

Excursion 1 Tour II: Error Probing Tools versus Logics of Evidence-Excerpt

“Fusion-Confusion?” My Discussion of Nancy Reid: “BFF Four- Are we Converging?”

What’s wrong with taking (1 – β)/α, as a likelihood ratio comparing H0 and H1?

How likelihoodists exaggerate evidence from statistical tests

Why the Law of Likelihood is bankrupt–as an account of evidence

BREAKING THE (Royall) LAW! (of likelihood) (C)

BREAKING THE LAW! (of likelihood): to keep their fit measures in line (A), (B 2nd)

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.