Souvenir C: A Severe Tester’s Translation Guide (Excursion 1 Tour II)

Posted on November 8, 2018 by Mayo

I will continue to post mementos and, at times, short excerpts following the pace of one “Tour” a week, in sync with some book clubs reading Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (SIST or Statinfast 2018, CUP), e.g., Lakens. This puts us at Excursion 2 Tour I, but first, here’s a quick Souvenir (Souvenir C) from Excursion 1 Tour II:

Souvenir C: A Severe Tester’s Translation Guide

Just as in ordinary museum shops, our souvenir literature often probes treasures that you didn’t get to visit at all. Here’s an example of that, and you’ll need it going forward. There’s a confusion about what’s being done when the significance tester considers the set of all of the outcomes leading to a d(x) greater than or equal to 1.96, i.e., {x: d(x) ≥ 1.96}, or just d(x) ≥ 1.96. This is generally viewed as throwing away the particular x, and lumping all these outcomes together. What’s really happening, according to the severe tester, is quite different. What’s actually being signified is that we are interested in the method, not just the particular outcome. Those who embrace the LP make it very plain that data-dependent selections and stopping rules drop out. To get them to drop in, we signal an interest in what the test procedure would have yielded. This is a counterfactual and is altogether essential in expressing the properties of the method, in particular, the probability it would have yielded some nominally significant outcome or other.

When you see Pr(d(X) ≥ d(x₀); H₀), or Pr(d(X) ≥ d(x₀); H₁), for any particular alternative of interest, insert:

“the test procedure would have yielded”

just before the d(X). In other words, this expression, with its inequality, is a signal of interest in, and an abbreviation for, the error probabilities associated with a test.

Applying the Severity Translation. In Exhibit (i), Royall described a significance test with a Bernoulli(θ) model, testing H₀: θ ≤ 0.2 vs. H₁: θ >0.2. We blocked an inference from observed difference d(x) = 3.3 to θ = 0.8 as follows. (Recall that x= 0.53 and d(x₀) ≃ 3.3.)

We computed Pr(d(X) > 3.3; θ = 0.8) ≃1.

We translate it as Pr(The test would yield d(X) > 3.3; θ = 0.8) ≃1.

We then reason as follows:

Statistical inference: If θ = 0.8, then the method would virtually always give a difference larger than what we observed. Therefore, the data indicate θ < 0.8.

(This follows for rejecting H₀ in general.) When we ask: “How often would your test have found such a significant effect even if H₀ is approximately true?” we are asking about the properties of the experiment that did happen. The counterfactual “would have” refers to how the procedure would behave in general, not just with these data, but with other possible data sets in the sample space.

Categories: Statistical Inference as Severe Testing | 8 Comments

8 thoughts on “Souvenir C: A Severe Tester’s Translation Guide (Excursion 1 Tour II)”

December 4, 2018

coreyyanofsky

I have a question about the Birnbaum example in θ can take values in 0, 1, …, 100 and for which the probability model is X = θ with probability 1 if θ ≠ 0 and is uniformly distributed on 1,… ,100 if θ = 0.

Q1. If I pre-designate H1: θ = 37 to test against H2: θ = 0 and then r = 37, has H1 passed a severe test?

Q2. Does it matter how I made the choice to test θ = 37?

Q3. What if I pre-designate five values chosen at random, (say, 4, 24, 46, 63, 83; I literally just chose them at random) and declare that my test procedure is to claim H2: θ = 0 if I observe one of them — and I do. Has H2 passed a severe test? Why or why not?

Reply
December 4, 2018

coreyyanofsky

Q3 is incomplete; the procedure is to H2: θ = 0 if I observe one of the 5 values and to claim “θ = r” if I don’t.

Reply
December 4, 2018

Mayo

A the example says, there is error control with 2 predesignated hyps. But moving to the severe testing context, the inferences aren’t to points. It would fall under anon-exhaustive case, from the soundof it, so the ans is no.

Reply

December 4, 2018

coreyyanofsky

So here’s the severity criteria I’m looking at:

“A hypothesis H passes a severe test T with data x0 if,
(S-1) x0 accords with H, (for a suitable notion of accordance) and
(S-2) with very high probability, test T would have produced a result
that accords less well with H than x0 does, if H were false or incorrect.
Equivalently, (S-2) can be stated:
(S-2)*: with very low probability, test T would have produced a result
that accords as well as or better with H than x0 does, if H were false
or incorrect.”

So in Q1, the inference is to one of the two possible values of θ that survive post-data. I look at the severity criteria and I see (with change of notation) that a severe test has been passed if:

(S-1): r = 37 accords with H1: θ = 37, (for a suitable notion of accordance) and
(S-2): with very low probability, my test procedure would have produced a result
that accords as well as or better with H1 than r = 37 does, if H1 were false
or incorrect.

(S-1) seems clearly satisfied. For (S-2), I consider ways that H1 could have been incorrect. The data rule out all possibilities except θ = 37 or θ = 0; under θ = 0, the probability that my test procedure would have produced a result that accords as well as (better isn’t possible) r = 37 with H1 is 0.01, which is indeed very low.

In Q3, I have r in the set of 5 values that I have pre-designated (uniformly at random) as indicating H2: θ = 0. For ease of notation, call that set of 5 values S. I look at the severity criteria and I that a severe test has been passed if:

(S-1): the result “r is in S” accords with H2: θ = 0, (for a suitable notion of accordance) and
(S-2): with very low probability, my test procedure would have produced a result
that accords as well as or better with H2 than does the result “r is in S”, if H2 were false
or incorrect.

For (S-1) I’m not sure what a suitable notion of accordance might be; it seems like all possible values of r are equally in accord with θ = 0. For (S-2), I consider ways that H2 could have been incorrect. The data rule out all possibilities except θ = r or θ = 0; under θ = r, the probability that my test procedure (which, recall, is to choose 5 values at random for S) would have produced a result that accords as well with H2 (better isn’t possible) as “r is in S” is 0.05, which is indeed very low.

Reply

December 5, 2018

Mayo

Just on Q1, other values for the parameter aren’t ruled out by the outcome. With the predesignated pts there’s no trouble in inferring the likelihoodist’s comparative evidence claim. With one observation, it’s a weak indication & no way to check assumptions. As far as adding increasing predesignated hypos, it’s discussed in 6.3, 6.4 of Mayo and Kruse: yes there can be error control though it diminishes with increasing hyps. https://www.phil.vt.edu/dmayo/personal_website/Mayo%20&%20Kruse%20Principles%20of%20inference%20and%20their%20consequences%20B.pdf

Q2- yes. Q3 doesn’t look like a good test statistic.

I don’t really see what your getting at. About to shut down tonight.

Reply

December 5, 2018

coreyyanofsky

On Q1, other parameter values are indeed ruled out by the outcome. The probability of getting r = 37 when, say, θ = 44, is nil, because the model specifies Pr(X = 44; θ = 44) = 1 which immediately implies Pr(X ≠ 44; θ = 44) = 0.

Likelihood theory is not the topic I’m interested in here — just severity.

On Q2, could you maybe go into a little more detail?

On Q3, it absolutely is not a good statistic. The question is basically on what grounds an error statistician can say so.

Reply

December 5, 2018

omaclaren

Re: Q3,

This is why I think (I think!) S1 should be (as I mentioned in a recent post):

S1’: The test indicates H with high probability under H.

Your choice of stat has a fairly low prob of indicating H, under repeated sampling under H, no?

Reply

Pingback: National Academies of Science: Please Correct Your Definitions of P-values | Error Statistics Philosophy

I welcome constructive comments that are of relevance to the post and the discussion, and discourage detours into irrelevant topics, however interesting, or unconstructive declarations that "you (or they) are just all wrong". If you want to correct or remove a comment, send me an e-mail. If readers have already replied to the comment, you may be asked to replace it to retain comprehension. Cancel reply

Souvenir C: A Severe Tester’s Translation Guide (Excursion 1 Tour II)

Post navigation

8 thoughts on “Souvenir C: A Severe Tester’s Translation Guide (Excursion 1 Tour II)”

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.

Souvenir C: A Severe Tester’s Translation Guide (Excursion 1 Tour II)

Related

Post navigation

8 thoughts on “Souvenir C: A Severe Tester’s Translation Guide (Excursion 1 Tour II)”

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.