ASA Task Force on Significance and Replicability

Comments on “The ASA p-value statement 10 years on” (ii)

Posted on March 26, 2026 by Mayo

Given how much I’ve blogged about the 2016 ASA p-value statement, the 2019 Executive Editor’s editorial in The American Statistician (TAS), the 2020 ASA (President’s) Task Force, and the various casualties of the related teeth pulling, I thought I should say something about the recent article by Robert Matthews in Significance (March 2026): “The ASA p-value statement 10 years on: An event of statistical significance?” He begins: “Ten years ago this month, the American Statistical Association (ASA) took the unprecedented step of issuing a statement on one of the most controversial issues in statistics: the use and abuse of p-values.” The Statement is here, 2016 ASA Statement on P-Values and Statistical Significance [1]. The Executive director of the ASA, Ronald Wasserstein, invited me to be a ”philosophical observer” at the meeting which gave rise to the 2016 statement. Although the 2016 ASA statement wasn’t radically controversial, at least as compared to the 2019 Executive Editor’s editorial, which I’ll get to in a minute, it was met with critical reactions on all sides. Stephen Senn provides a figure displaying relationships between reactions. Here’s how Matthews’ article begins: Continue reading →

Categories: abandon statistical significance, ASA Task Force on Significance and Replicability, P-values, significance tests, stat wars and their casualties | 26 Comments

My 2019 friendly amendments to that “abandon significance” editorial

Posted on April 5, 2024 by Mayo

It was 3 months before I decided to write a blogpost in response to Wasserstein, Schirm and Lazar (2019)’s editorial in The American Statistician in which they recommend that the concept of “statistical significance” be abandoned, hereafter, WSL 2019. (I titled it “Don’t Say What You don’t Mean”.) In that June 17, 2019 blogpost, pasted below, I proposed 3 “friendly amendments” to the language of that document. (There are 97 comments on that post!) The problem is that WSL 2019 presents several of the 6 principles from ASA I (the 2016 ASA statement on Statistical Significance) in a far stronger fashion so as to be inconsistent or at least in tension with some of them. I didn’t think they really meant what they said. I discussed these amendments with Ron Wasserstein, Executive Director of the ASA at the time. Had these friendly amendments been carried out, the document would not have caused as much of a problem, and people might focus more on the positive recommendations it includes about scientific integrity. The proposed ban on a key concept of statistics would still be problematic, resulting in the 2019 ASA President’s Task Force, but it would have helped the document. At the time, it was still not known whether WSL 2019 was intended as a continuation of the 2016 ASA policy document [ASA I]. That explains why I first referred to WSL 2019 in this blogpost as ASA II. Once it was revealed that it was not official policy at all (many months later), but only the recommendations of the 3 authors, I placed a “note” after each mention of ASA II. But given it caused sufficient confusion as to result in the then ASA president (Karen Kafadar) appointing an ASA Task Force on Statistical Significance and Replicability in 2019 (see here and here), and later, a disclaimer by the authors, in this reblog I refer to it as WSL 2019. You can search this blog for other posts on the 2019 Task Force: their report is here, and the disclaimer here. Continue reading →

Categories: 2016 ASA Statement on P-values, ASA Guide to P-values, ASA Task Force on Significance and Replicability | Leave a comment

Too little, too late? The “Don’t say significance…” editorial gets a disclaimer (ii)

Posted on June 15, 2022 by Mayo

Someone sent me an email the other day telling me that a disclaimer had been added to the editorial written by the ASA Executive Director and 2 co-authors (Wasserstein et al., 2019) (“Moving to a world beyond ‘p < 0.05′”). It reads:

The editorial was written by the three editors acting as individuals and reflects their scientific views not an an endorsed position of the American Statistical Association.

Continue reading →

Categories: ASA Guide to P-values, ASA Task Force on Significance and Replicability, editorial COIs, WSL 2019 | 20 Comments

January 11 Forum: “Statistical Significance Test Anxiety” : Benjamini, Mayo, Hand

Posted on February 24, 2022 by Mayo

Here are all the slides along with the video from the 11 January Phil Stat Forum with speakers: Deborah G. Mayo, Yoav Benjamini and moderator/discussant David Hand.

D. Mayo Y. Benjamini. D. Hand

Continue reading →

Categories: ASA Guide to P-values, ASA Task Force on Significance and Replicability, P-values, statistical significance | 2 Comments

Nathan Schactman: Of Significance, Error, Confidence, and Confusion – In the Law and In Statistical Practice (Guest Post)

Posted on January 18, 2022 by Mayo

Nathan Schachtman, Esq., J.D.
Legal Counsel for Scientific Challenges

Of Significance, Error, Confidence, and Confusion – In the Law and In Statistical Practice

The metaphor of law as an “empty vessel” is frequently invoked to describe the law generally, as well as pejoratively to describe lawyers. The metaphor rings true at least in describing how the factual content of legal judgments comes from outside the law. In many varieties of litigation, not only the facts and data, but the scientific and statistical inferences must be added to the “empty vessel” to obtain a correct and meaningful outcome. Continue reading →

Categories: ASA Guide to P-values, ASA Task Force on Significance and Replicability, PhilStat Law, Schachtman | 3 Comments

John Park: Poisoned Priors: Will You Drink from This Well?(Guest Post)

Posted on January 17, 2022 by Mayo

John Park, MD
Radiation Oncologist
Kansas City VA Medical Center

Poisoned Priors: Will You Drink from This Well?

As an oncologist, specializing in the field of radiation oncology, “The Statistics Wars and Intellectual Conflicts of Interest”, as Prof. Mayo’s recent editorial is titled, is one of practical importance to me and my patients (Mayo, 2021). Some are flirting with Bayesian statistics to move on from statistical significance testing and the use of P-values. In fact, what many consider the world’s preeminent cancer center, MD Anderson, has a strong Bayesian group that completed 2 early phase Bayesian studies in radiation oncology that have been published in the most prestigious cancer journal —The Journal of Clinical Oncology (Liao et al., 2018 and Lin et al, 2020). This brings about the hotly contested issue of subjective priors and much ado has been written about the ability to overcome this problem. Specifically in medicine, one thinks about Spiegelhalter’s classic 1994 paper mentioning reference, clinical, skeptical, or enthusiastic priors who also uses an example from radiation oncology (Spiegelhalter et al., 1994) to make his case. This is nice and all in theory, but what if there is ample evidence that the subject matter experts have major conflicts of interests (COIs) and biases so that their priors cannot be trusted? A debate raging in oncology, is whether non-invasive radiation therapy is as good as invasive surgery for early stage lung cancer patients. This is a not a trivial question as postoperative morbidity from surgery can range from 19-50% and 90-day mortality anywhere from 0–5% (Chang et al., 2021). Radiation therapy is highly attractive as there are numerous reports hinting at equal efficacy with far less morbidity. Unfortunately, 4 major clinical trials were unable to accrue patients for this important question. Why could they not enroll patients you ask? Long story short, if a patient is referred to radiation oncology and treated with radiation, the surgeon loses out on the revenue, and vice versa. Dr. David Jones, a surgeon at Memorial Sloan Kettering, notes there was no “equipoise among enrolling investigators and medical specialties… Although the reasons are multiple… I believe the primary reason is financial” (Jones, 2015). I am not skirting responsibility for my field’s biases. Dr. Hanbo Chen, a radiation oncologist, notes in his meta-analysis of multiple publications looking at surgery vs radiation that overall survival was associated with the specialty of the first author who published the article (Chen et al, 2018). Perhaps the pen is mightier than the scalpel! Continue reading →

Categories: ASA Task Force on Significance and Replicability, Bayesian priors, PhilStat/Med, statistical significance tests | Tags: poisoned priors | 4 Comments

Philip Stark (guest post): commentary on “The Statistics Wars and Intellectual Conflicts of Interest” (Mayo Editorial)

Posted on January 14, 2022 by Mayo

Philip B. Stark
Professor
Department of Statistics
University of California, Berkeley

I enjoyed Prof. Mayo’s comment in Conservation Biology Mayo, 2021 very much, and agree enthusiastically with most of it. Here are my key takeaways and reflections.

Error probabilities (or error rates) are essential to consider. If you don’t give thought to what the data would be like if your theory is false, you are not doing science. Some applications really require a decision to be made. Does the drug go to market or not? Are the girders for the bridge strong enough, or not? Hence, banning “bright lines” is silly. Conversely, no threshold for significance, no matter how small, suffices to prove an empirical claim. In replication lies truth. Abandoning P-values exacerbates moral hazard for journal editors, although there has always been moral hazard in the gatekeeping function. Absent any objective assessment of evidence, publication decisions are even more subject to cronyism, “taste”, confirmation bias, etc. Throwing away P-values because many practitioners don’t know how to use them is perverse. It’s like banning scalpels because most people don’t know how to perform surgery. People who wish to perform surgery should be trained in the proper use of scalpels, and those who wish to use statistics should be trained in the proper use of P-values. Throwing out P-values is self-serving to statistical instruction, too: we’re making our lives easier by teaching less instead of teaching better. Continue reading →

Categories: ASA Task Force on Significance and Replicability, editorial, multiplicity, P-values | 6 Comments

The ASA controversy on P-values as an illustration of the difficulty of statistics

Posted on January 9, 2022 by Mayo

Christian Hennig
Professor
Department of Statistical Sciences
University of Bologna

The ASA controversy on P-values as an illustration of the difficulty of statistics

“I work on Multidimensional Scaling for more than 40 years, and the longer I work on it, the more I realise how much of it I don’t understand. This presentation is about my current state of not understanding.” (John Gower, world leading expert on Multidimensional Scaling, on a conference in 2009)

“The lecturer contradicts herself.” (Student feedback to an ex-colleague for teaching methods and then teaching what problems they have)

1 Limits of understanding

Statistical tests and P-values are widely used and widely misused. In 2016, the ASA issued a statement on significance and P-values with the intention to curb misuse while acknowledging their proper definition and potential use. In my view the statement did a rather good job saying things that are worthwhile saying while trying to be acceptable to those who are generally critical on P-values as well as those who tend to defend their use. As was predictable, the statement did not settle the issue. A “2019 editorial” by some of the authors of the original statement (recommending “to abandon statistical significance”) and a 2021 ASA task force statement, much more positive on P-values, followed, showing the level of disagreement in the profession. Continue reading →

Categories: ASA Task Force on Significance and Replicability, Mayo editorial, P-values | 3 Comments

E. Ionides & Ya’acov Ritov (Guest Post) on Mayo’s editorial, “The Statatistics Wars and Intellectual Conflicts of Interest”

Posted on January 8, 2022 by Mayo

Edward L. Ionides

Director of Undergraduate Programs and Professor,
Department of Statistics, University of Michigan

Ya’acov Ritov Professor
Department of Statistics, University of Michigan

Thanks for the clear presentation of the issues at stake in your recent Conservation Biology editorial (Mayo 2021). There is a need for such articles elaborating and contextualizing the ASA President’s Task Force statement on statistical significance (Benjamini et al, 2021). The Benjamini et al (2021) statement is sensible advice that avoids directly addressing the current debate. For better or worse, it has no references, and just speaks what looks to us like plain sense. However, it avoids addressing why there is a debate in the first place, and what are the justifications and misconceptions that drive different positions. Consequently, it may be ineffective at communicating to those swing voters who have sympathies with some of the insinuations in the Wasserstein & Lazar (2016) statement. We say “insinuations” here since we consider that their 2016 statement made an attack on p-values which was forceful, indirect and erroneous. Wasserstein & Lazar (2016) started with a constructive discussion about the uses and abuses of p-values before moving against them. This approach was good rhetoric: “I have come to praise p-values, not to bury them” to invert Shakespeare’s Anthony. Good rhetoric does not always promote good science, but Wasserstein & Lazar (2016) successfully managed to frame and lead the debate, according to Google Scholar. We warned of the potential consequences of that article and its flaws (Ionides et al, 2017) and we refer the reader to our article for more explanation of these issues (it may be found below). Wasserstein, Schirm and Lazar (2019) made their position clearer, and therefore easier to confront. We are grateful to Benjamini et al (2021) and Mayo (2021) for rising to the debate. Rephrasing Churchill in support of their efforts, “Many forms of statistical methods have been tried, and will be tried in this world of sin and woe. No one pretends that the p-value is perfect or all-wise. Indeed (noting that its abuse has much responsibility for the replication crisis) it has been said that the p-value is the worst form of inference except all those other forms that have been tried from time to time”. Continue reading →

Categories: ASA Task Force on Significance and Replicability, editors, P-values, significance tests | 2 Comments

B. Haig on questionable editorial directives from Psychological Science (Guest Post)

Posted on January 7, 2022 by Mayo

Brian Haig, Professor Emeritus
Department of Psychology
University of Canterbury
Christchurch, New Zealand

What do editors of psychology journals think about tests of statistical significance? Questionable editorial directives from Psychological Science

Deborah Mayo’s (2021) recent editorial in Conservation Biology addresses the important issue of how journal editors should deal with strong disagreements about tests of statistical significance (ToSS). Her commentary speaks to applied fields, such as conservation science, but it is relevant to basic research, as well as other sciences, such as psychology. In this short guest commentary, I briefly remark on the role played by the prominent journal, Psychological Science (PS), regarding whether or not researchers should employ ToSS. PS is the flagship journal of the Association for Psychological Science, and two of its editors-in-chief have offered explicit, but questionable, advice on this matter. Continue reading →

Categories: ASA Task Force on Significance and Replicability, Brian Haig, editors, significance tests | Tags: statistics wars | 2 Comments

Invitation to discuss the ASA Task Force on Statistical Significance and Replication

Posted on July 30, 2021 by Mayo

The latest salvo in the statistics wars comes in the form of the publication of The ASA Task Force on Statistical Significance and Replicability, appointed by past ASA president Karen Kafadar in November/December 2019. (In the ‘before times’!) Its members are:

Linda Young, (Co-Chair), Xuming He, (Co-Chair) Yoav Benjamini, Dick De Veaux, Bradley Efron, Scott Evans, Mark Glickman, Barry Graubard, Xiao-Li Meng, Vijay Nair, Nancy Reid, Stephen Stigler, Stephen Vardeman, Chris Wikle, Tommy Wright, Karen Kafadar, Ex-officio. (Kafadar 2020)

The full report of this Task Force is in the The Annals of Applied Statistics, and on my blogpost. It begins:

In 2019 the President of the American Statistical Association (ASA) established a task force to address concerns that a 2019 editorial in The American Statistician (an ASA journal) might be mistakenly interpreted as official ASA policy. (The 2019 editorial recommended eliminating the use of “p < 0.05” and “statistically significant” in statistical analysis.) This document is the statement of the task force… (Benjamini et al. 2021)

Continue reading →

Categories: 2016 ASA Statement on P-values, ASA Task Force on Significance and Replicability, JSM 2020, National Institute of Statistical Sciences (NISS), statistical significance tests | 3 Comments

Statisticians Rise Up To Defend (error statistical) Hypothesis Testing

Posted on June 28, 2021 by Mayo

What is the message conveyed when the board of a professional association X appoints a Task Force intended to dispel the supposition that a position advanced by the Executive Director of association X does not reflect the views of association X on a topic that members of X disagree on? What it says to me is that there is a serious break-down of communication amongst the leadership and membership of that association. So while I’m extremely glad that the ASA appointed the Task Force on Statistical Significance and Replicability in 2019, I’m very sorry that the main reason it was needed was to address concerns that an editorial put forward by the ASA Executive Director (and 2 others) “might be mistakenly interpreted as official ASA policy”. The 2021 Statement of the Task Force (Benjamini et al. 2021) explains:

In 2019 the President of the American Statistical Association (ASA) established a task force to address concerns that a 2019 editorial in The American Statistician (an ASA journal) might be mistakenly interpreted as official ASA policy. (The 2019 editorial recommended eliminating the use of “p < 0.05” and “statistically significant” in statistical analysis.) This document is the statement of the task force…

Continue reading →

Categories: ASA Task Force on Significance and Replicability, Schachtman, significance tests | 10 Comments

At long last! The ASA President’s Task Force Statement on Statistical Significance and Replicability

Posted on June 20, 2021 by Mayo

The ASA President’s Task Force Statement on Statistical Significance and Replicability has finally been published. It found a home in The Annals of Applied Statistics, after everyone else they looked to–including the ASA itself– refused to publish it. For background see this post. I’ll comment on it in a later post. There is also an Editorial: Statistical Significance, P-Values, and Replicability by Karen Kafadar. Continue reading →

Categories: ASA Task Force on Significance and Replicability | 11 Comments

ASA Task Force on Significance and Replicability

Comments on “The ASA p-value statement 10 years on” (ii)

My 2019 friendly amendments to that “abandon significance” editorial

Too little, too late? The “Don’t say significance…” editorial gets a disclaimer (ii)

January 11 Forum: “Statistical Significance Test Anxiety” : Benjamini, Mayo, Hand

Nathan Schactman: Of Significance, Error, Confidence, and Confusion – In the Law and In Statistical Practice (Guest Post)

John Park: Poisoned Priors: Will You Drink from This Well?(Guest Post)

Philip Stark (guest post): commentary on “The Statistics Wars and Intellectual Conflicts of Interest” (Mayo Editorial)

The ASA controversy on P-values as an illustration of the difficulty of statistics

E. Ionides & Ya’acov Ritov (Guest Post) on Mayo’s editorial, “The Statatistics Wars and Intellectual Conflicts of Interest”

B. Haig on questionable editorial directives from Psychological Science (Guest Post)

Invitation to discuss the ASA Task Force on Statistical Significance and Replication

Statisticians Rise Up To Defend (error statistical) Hypothesis Testing

At long last! The ASA President’s Task Force Statement on Statistical Significance and Replicability

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.