Posts of Christmas Past (1): 13 howlers of significance tests (and how to avoid them)

I’m reblogging a post from Christmas past–exactly 7 years ago. Guess what I gave as the number 1 (of 13) ~~howler~~ well-worn criticism of statistical significance tests, haunting us back in 2012–all of which are put to rest in Mayo and Spanos 2011? Yes, it’s the frightening allegation that statistical significance tests forbid using any background knowledge! The researcher is imagined to start with a “blank slate” in each inquiry (no memories of fallacies past), and then unthinkingly apply a purely formal, automatic, accept-reject machine. What’s newly frightening (in 2019) is the credulity with which this apparition is now being met (by some). I make some new remarks below the post from Christmas past:

2013 is right around the corner, and here are 13 well-known criticisms of statistical significance tests, and how they are addressed within the error statistical philosophy, as discussed in Mayo, D. G. and Spanos, A. (2011) “Error Statistics“.

(#1) Error statistical tools forbid using any background knowledge [1].
(#2) All statistically signiﬁcant results are treated the same.
(#3) The p-value does not tell us how large a discrepancy is found.
(#4) With large enough sample size even a trivially small discrepancy from the null can be detected.
(#5) Whether there is a statistically signiﬁcant diﬀerence from the null depends on which is the null and which is the alternative.
(#6) Statistically insigniﬁcant results are taken as evidence that the null hypothesis is true.
(#7) Error probabilities are misinterpreted as posterior probabilities.
(#8) Error statistical tests are justiﬁed only in cases where there is a very long (if not inﬁnite) series of repetitions of the same experiment.
(#9) Specifying statistical tests is too arbitrary.
(#10) We should be doing conﬁdence interval estimation rather than signiﬁcance tests.
(#11) Error statistical methods take into account the intentions of the scientists analyzing the data.
(#12) All models are false anyway.
(#13) Testing assumptions involves illicit data-mining.

You can read how we avoid them in the full paper here.

My book, Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (SIST 2018, CUP), excavates the most recent variations on all of these howlers. To allege that statistical significance tests don’t use background information is a willful distortion of the tests which Fisher developed, hand-in-hand, with a large methodology of experimental design: randomization, predesignation and testing model assumptions. All these depend on incorporating background information into the specification and interpretation of tests. “The purpose of randomisation” Fisher made clear, “is to guarantee the validity of the test of significance” (1935). Observational (and other) studies that lack proper controls may well need to concede that any reported P-values are illicit–but then why report P-values at all? (Confidence levels are then equally illicit, except as descriptive measures without error control.) I say they should not report P-values lacking in error-statistical interpretations, at least not without reporting this. But don’t punish studies that work hard to attain error control.

Before you jump on the popular (but misguided) bandwagons of “abandoning statistical significance” or derogating P-values as so-called “purely (blank slate) statistical measures”, ask for evidence supporting the criticisms.[2] You will find they are based on rather blatant misuses and abuses. Only by blocking the credulity with which such apparitions are met these days (in some circles) can we attain improved statistical inferences in Christmases yet to come.

[1] “Error statistical methods” is an umbrella term for methods that employ probability in inference to assess and control the capabilities of methods to avoid mistakes in interpreting data. It includes statistical significance tests, confidence intervals confidence distributions, randomization, resampling and bootstrapping. A proper subset of error statistical methods are those that use error probabilities to assess and control the severity with which claims may be said to have passed a test (with given data). A claim C passes a test with severity to the extent that it has been subjected to and survives a test that probably would have found specified flaws in C, if present. Please see excerpts from SIST 2018.

[2] See

November 4, 2019:On some Self-defeating aspects of the ASA’s 2019 recommendations of statistical significance tests
November 14, 2019: The ASA’s P-value Project: Why it’s Doing More Harm than Good (cont from 11/4/19)
November 30, 2019: P-Value Statements and Their Unintended(?) Consequences: The June 2019 ASA President’s Corner (b)

The paper referred to in the post from Christmas past (1) is:

Mayo, D. G. and Spanos, A. (2011) “Error Statistics” in Philosophy of Statistics , Handbook of Philosophy of Science Volume 7 Philosophy of Statistics.

I welcome constructive comments that are of relevance to the post and the discussion, and discourage detours into irrelevant topics, however interesting, or unconstructive declarations that "you (or they) are just all wrong". If you want to correct or remove a comment, send me an e-mail. If readers have already replied to the comment, you may be asked to replace it to retain comprehension. Cancel reply

Posts of Christmas Past (1): 13 howlers of significance tests (and how to avoid them)

Post navigation

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.

Posts of Christmas Past (1): 13 howlers of significance tests (and how to avoid them)

Related

Post navigation

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.