Monthly Archives: March 2026

Comments on “The ASA p-value statement 10 years on” (ii)

Posted on March 26, 2026 by Mayo

Given how much I’ve blogged about the 2016 ASA p-value statement, the 2019 Executive Editor’s editorial in The American Statistician (TAS), the 2020 ASA (President’s) Task Force, and the various casualties of the related teeth pulling, I thought I should say something about the recent article by Robert Matthews in Significance (March 2026): “The ASA p-value statement 10 years on: An event of statistical significance?” He begins: “Ten years ago this month, the American Statistical Association (ASA) took the unprecedented step of issuing a statement on one of the most controversial issues in statistics: the use and abuse of p-values.” The Statement is here, 2016 ASA Statement on P-Values and Statistical Significance [1]. The Executive director of the ASA, Ronald Wasserstein, invited me to be a ”philosophical observer” at the meeting which gave rise to the 2016 statement. Although the 2016 ASA statement wasn’t radically controversial, at least as compared to the 2019 Executive Editor’s editorial, which I’ll get to in a minute, it was met with critical reactions on all sides. Stephen Senn provides a figure displaying relationships between reactions. Here’s how Matthews’ article begins: Continue reading →

Categories: abandon statistical significance, ASA Task Force on Significance and Replicability, P-values, significance tests, stat wars and their casualties | 26 Comments

Power and Severity with nonsignificant results: more power puzzles? (ii)

Posted on March 14, 2026 by Mayo

The concept of a test’s power, originating in Neyman-Pearson’s early work, by and large, is a pre-data concept for purposes of specifying a test (notably, determining worthwhile sample size), and choosing between tests. In some papers, however, Neyman lists a third goal for power: to interpret test results post data much in the spirit of what is often called “power analysis”. This is to determine the discrepancy from a null hypothesis that may be ruled out, given nonsignificant results. One example is in a paper “The Problem of Inductive Inference” (Neyman 1955)–already a surprising title for behaviorist Neyman. The reason I’m bringing this up is that it has direct bearing on some of today’s most puzzling (and problematic) post-data uses of power. Interestingly, in that 1955 paper, Neyman is talking to none other than the logical positivist philosopher of confirmation, Rudof Carnap:

I am concerned with the term “degree of confirmation” introduced by Carnap. …We have seen that the application of the locally best one-sided test to the data … failed to reject the hypothesis [that the n observations come from a source in which the null hypothesis is true]. The question is: does this result “confirm” the hypothesis that H₀ is true of the particular data set? (Neyman, pp 40-41).

Neyman continues: Continue reading →

Categories: Neyman's Nursery, power analysis | Tags: negative result, Neyman, power, severe testing | Leave a comment

Continuing the blizzard of 26 power puzzles

Posted on March 3, 2026 by Mayo

The mayor of NYC offered $30 an hour to help shovel the ~ 30 inches of snow that fell last Sunday and Monday. From what I hear, it was a very effective program. Here’s a little power puzzle to very easily shovel through [1]

Suppose you are reading about a result x that is just statistically significant at level α (i.e., P-value = α) in a one-sided test T+ of the mean of a Normal distribution with n iid samples, and (for simplicity) known σ: H₀: µ ≤ 0 against H₁: µ > 0. I have heard some people say:

A. If the test’s power to detect alternative µ’ is very low, then the just statistically significant x is poor evidence of a discrepancy (from the null) corresponding to µ’. (i.e., there’s poor evidence that µ > µ’ ). I am keeping symbols as simple as possible. *See point on language in notes.

They will generally also hold that if POW(µ’) is reasonably high (at least .5), then the inference to µ > µ’ is warranted, or at least not problematic.

I have heard other people say:

B. If the test’s power to detect alternative µ’ is very low, then the just statistically significant x is good evidence of a discrepancy (from the null) corresponding to µ’ (i.e., there’s good evidence that µ > µ’).

They will generally also hold that if POW(µ’) is reasonably high (at least .5), then the inference to µ > µ’ is unwarranted.

Which is correct, from the perspective of the (error statistical) philosophy, within which power and associated tests are defined? Continue reading →

Categories: blizzard of 26 power puzzles, power, reforming the reformers | 1 Comment

A Blizzard of Power Puzzles Replicate in Meta-Research

Posted on March 1, 2026 by Mayo

I often say that the most misunderstood concept in error statistics is power. One week ago, stuck in the blizzard of 2026 in NYC —exciting, if also a bit unnerving, with airports closed for two and a half days and no certainty of when I might fly out—I began collecting the many power howlers I’ve discussed in the past, because some of them are being replicated in todays meta-research about replication failure! Apparently, mistakes about statistical concepts replicate quite reliably—even when statistically significant effects do not. Others I find in medical reports of clinical trials of treatments I’m trying to evaluate in real life! Here’s one variant: A statistically significant result in a clinical trial with fairly high (e.g., .8) power to detect an impressive improvement δ’ is taken as good evidence of its impressive improvement δ’. Often the high power of .8 is even used as a (posterior) probability of the hypothesis of improvement being δ’. [0] If these do not immediately strike you as fallacious, compare:

If the house is fully ablaze, then very probably the fire alarm goes off.
If the fire alarm goes off, then very probably the house is fully ablaze.

The first bullet is saying the fire alarm has high power to detect the house being fully ablaze. It does not mean the converse in the second bullet. Continue reading →

Categories: blizzard of 26, power, SIST, statistical significance tests | Tags: misunderstanding power, power analysts, Ziliac & McCloskey | 11 Comments

Monthly Archives: March 2026

Comments on “The ASA p-value statement 10 years on” (ii)

Power and Severity with nonsignificant results: more power puzzles? (ii)

Continuing the blizzard of 26 power puzzles

A Blizzard of Power Puzzles Replicate in Meta-Research

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.