Tour Guide Mementos (Excursion 1 Tour II of How to Get Beyond the Statistics Wars)

Stat Museum

Excursion 1 Tour II: Error Probing Tools vs. Logics of Evidence

Blurb. Core battles revolve around the relevance of a method’s error probabilities. What’s distinctive about the severe testing account is that it uses error probabilities evidentially: to assess how severely a claim has passed a test. Error control is necessary but not sufficient for severity. Logics of induction focus on the relationships between given data and hypotheses–so outcomes other than the one observed drop out. This is captured in the Likelihood Principle (LP). Tour II takes us to the crux of central wars in relation to the Law of Likelihood (LL) and Bayesian probabilism. (1.4) Hypotheses deliberately designed to accord with the data can result in minimal severity. The likelihoodist wishes to oust them via degrees of belief captured in prior probabilities. To the severe tester, such gambits directly alter the evidence by leading to inseverity. (1.5) Stopping rules: If a tester tries and tries again until significance is reached–optional stopping–significance will be attained erroneously with high probability. According to the LP, the stopping rule doesn’t alter evidence. The irrelevance of optional stopping is an asset for holders of the LP, it’s the opposite for a severe tester. The warring sides talk past each other.

1.4 The Law of Likelihood and Error Statistics: Key Items

Ian Hacking (1965) – the Law of Likelihood.

Law of Likelihood (LL): Data x are better evidence for hypothesis H1

than for H0 if x is more probable under H1 than under H0.

Likelihoods are defined and several examples are given.

Likelihoods of hypotheses should not be confused with their probabilities.

The Law of Likelihood (LL) is seen to fail the minimal severity requirement – at least if it is taken as an account of inference.

Gellerized hypotheses: maximally fitting, but minimally severely tested, hypotheses.

We observe one outcome, but we can consider that for any outcome, unless it makes H0 maximally likely, we can find an H1 that is more likely.

A severity assessment is one level removed: you give me the rule, and I consider its latitude for erroneous outputs.

Sampling distribution.

Richard Royall: He distinguishes three questions: belief, action, and evidence:

What do I believe, now that I have this observation?
What should I do, now that I have this observation?
How should I interpret this observation as evidence regarding [H0] versus [H1]?

Exhibit (i): Law of Likelihood Compared to a Significance Test.

Why the LL Reject Composite Hypotheses

Royall holds that all attempts to say whether x is good evidence for H, or even if x is better evidence for H than is y, are futile. Similarly,

“What does the [LL] say when one hypothesis attaches the same probability to two different observations? It says absolutely nothing . . . [it] applies when two different hypotheses attach probabilities to the same observation” (Royall 2004, p. 148).

The severe tester distinguishes the evidential warrant for one and the same hypothesis H in two cases: one where it was constructed post hoc, cherry picked, and so on, a second where it was predesignated.

Souvenir B: Likelihood versus Error Statistical

To the Likelihoodist, points in favor of the LL are:

The LR offers “a precise and objective numerical measure of the strength of statistical evidence” for one hypotheses over another; it is a frequentist account and does not use prior probabilities (Royall 2004, p. 123).

The LR is fundamentally related to Bayesian inference: the LR is the factor by which the ratio of posterior probabilities is changed by the data.

A Likelihoodist account does not consider outcomes other than the one observed, unlike P-values, and Type I and II errors. (Irrelevance of the sample space.)

Fishing for maximally fitting hypotheses and other gambits that alter error probabilities do not affect the assessment of evidence; they may be blocked by moving to the “belief” category.

To the error statistician, problems with the LL include:

LRs do not convey the same evidential appraisal in different contexts.
The LL denies it makes sense to speak of how well or poorly tested a single hypothesis is on evidence, essential for model checking; it is inapplicable to composite hypothesis tests.

A Likelihoodist account does not consider outcomes other than the one observed, unlike P-values, and Type I and II errors. (Irrelevance of the sample space.)
Fishing for maximally fitting hypotheses and other gambits that alter error probabilities do not affect the assessment of evidence; they may be blocked by moving to the “belief” category.

Notice, the last two points are identical for both. What’s a selling point for a Likelihoodist is a problem for an error statistician.

1.5 Trying and Trying again: Key Items

“ trying and trying again” to achieve statistical significance, stopping rules and their relevance/irrelevance

Edwards, Lindman, and Savage (E, L, & S, 1963).

Simmons, Nelson, and Simonsohn

The Likelihood Principle (LP).

Weak Repeated Sampling Principle.

(Cox and Hinkley 1974, p. 51). “ [W]e should not follow procedures which for some possible parameter values would give, in hypothetical repetitions, misleading conclusions most of the time” (ibid., pp. 45– 6).

The 1959 Savage Forum

Arguments from Intentions:

Error Probabilities Violate the LP

Problem of “ known (or old) evidence” made famous by Clark Glymour (1980).

Souvenir C. A severe Tester’s Translation Guide [i]

HOW TO FIND MATERIAL FROM EXCURSION 1 TOUR II (if you don’t have a copy of the book). I have not posted Excursion 1 Tour II (I did post Tour I). (Andrew Gelman may post a draft for a possible discussion on his blog.)

However, there are posts on this bog that cover much of the material from 1.4 and 1.5 (in blog form). For the material on Royall and the Law of Likelihood in 1.4 (including a link to an article by Royall), see this post; for stopping rules and the likelihood principle, see this post. That post also offers Museum links to the Savage Forum! You can also search this blog for terms of interest, and there’s quite a lot on those in 1.4 and 1.5. Have fun! Please share comments, queries, favorite quotes, etc.

[i] I may post Souvenir C separately.

Tour Guide Mementos (Excursion 1, Tour I of How to Get Beyond the Statistics Wars)

FOR ALL OF TOUR I (proofs): SIST Excursion 1 Tour I

FOR ALL OF TOUR II (proofs): SIST Excursion 1 Tour II

SIST Itinerary

One thought on “Tour Guide Mementos (Excursion 1 Tour II of How to Get Beyond the Statistics Wars)”

November 1, 2018

Mayo

FromD. Lakens’ twitter:

The biggest problem remains how likelihoods allow 'Gellerized' conclusions. Birnbaum (1969) already complained about the inability to control erroneous interpretations. Royal agrees. But he says evidence is the evidence. Any problems that seem to occur are on the level of belief.

— Daniël Lakens (@lakens) November 1, 2018

She repeats the '3 questions you can ask' – one of my favorite parts of Royall's 1997 book (I use it in my MOOC). But Mayo says her goal of probativeness (how well-tested is the case at hand) is not part of this set of questions. pic.twitter.com/a4e9zcxkJg

— Daniël Lakens (@lakens) November 1, 2018

The main starting point of the chapter is that the Law of Likelihood permits bad evidence. Since hypotheses can vary, you can always pick one that best fits the observed data. Mayo calls this 'Gellerized' (Uri Geller explained spoon bending failures after the fact) 🙂 pic.twitter.com/suabE8abGg

— Daniël Lakens (@lakens) November 1, 2018

The main difference is nicely explained in this paragraph. The severe tester does not want to treat cherry-picked or p-hacked evidence the same way as a situation where it was predicted. (I really agree with this). pic.twitter.com/rcXiT11RX2

— Daniël Lakens (@lakens) November 1, 2018

Mayo then moves on to how this specific issue leads to differences between Frequentists and Bayesians about stopping rules. The Bayesians Edwards, Lindman, and Savage tell us optional stopping is no problem from a Bayesian perspective (so does @JeffRouder https://t.co/JbotcfNrSm)

— Daniël Lakens (@lakens) November 1, 2018

I welcome constructive comments that are of relevance to the post and the discussion, and discourage detours into irrelevant topics, however interesting, or unconstructive declarations that "you (or they) are just all wrong". If you want to correct or remove a comment, send me an e-mail. If readers have already replied to the comment, you may be asked to replace it to retain comprehension. Cancel reply

Tour Guide Mementos (Excursion 1 Tour II of How to Get Beyond the Statistics Wars)

Post navigation

One thought on “Tour Guide Mementos (Excursion 1 Tour II of How to Get Beyond the Statistics Wars)”

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.

Tour Guide Mementos (Excursion 1 Tour II of How to Get Beyond the Statistics Wars)

Related

Post navigation

One thought on “Tour Guide Mementos (Excursion 1 Tour II of How to Get Beyond the Statistics Wars)”

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.