ASA Task Force on Significance and Replicability

Too little, too late? The “Don’t say significance…” editorial gets a disclaimer (ii)

.

Someone sent me an email the other day telling me that a disclaimer had been added to the editorial written by the ASA Executive Director and 2 co-authors (Wasserstein et al., 2019) (“Moving to a world beyond ‘p < 0.05′”). It reads:

 

The editorial was written by the three editors acting as individuals and reflects their scientific views not an an endorsed position of the American Statistical Association.

Continue reading

Categories: ASA Guide to P-values, ASA Task Force on Significance and Replicability, editorial COIs, WSL 2019 | 19 Comments

January 11 Forum: “Statistical Significance Test Anxiety” : Benjamini, Mayo, Hand

Here are all the slides along with the video from the 11 January Phil Stat Forum with speakers: Deborah G. Mayo, Yoav Benjamini and moderator/discussant David Hand.

D. Mayo                 Y. Benjamini.           D. Hand

Continue reading

Categories: ASA Guide to P-values, ASA Task Force on Significance and Replicability, P-values, statistical significance | 2 Comments

Nathan Schactman: Of Significance, Error, Confidence, and Confusion – In the Law and In Statistical Practice (Guest Post)

.

Nathan Schachtman,  Esq., J.D.
Legal Counsel for Scientific Challenges

Of Significance, Error, Confidence, and Confusion – In the Law and In Statistical Practice

The metaphor of law as an “empty vessel” is frequently invoked to describe the law generally, as well as pejoratively to describe lawyers. The metaphor rings true at least in describing how the factual content of legal judgments comes from outside the law. In many varieties of litigation, not only the facts and data, but the scientific and statistical inferences must be added to the “empty vessel” to obtain a correct and meaningful outcome. Continue reading

Categories: ASA Guide to P-values, ASA Task Force on Significance and Replicability, PhilStat Law, Schachtman | 3 Comments

John Park: Poisoned Priors: Will You Drink from This Well?(Guest Post)

.

John Park, MD
Radiation Oncologist
Kansas City VA Medical Center

Poisoned Priors: Will You Drink from This Well?

As an oncologist, specializing in the field of radiation oncology, “The Statistics Wars and Intellectual Conflicts of Interest”, as Prof. Mayo’s recent editorial is titled, is one of practical importance to me and my patients (Mayo, 2021). Some are flirting with Bayesian statistics to move on from statistical significance testing and the use of P-values. In fact, what many consider the world’s preeminent cancer center, MD Anderson, has a strong Bayesian group that completed 2 early phase Bayesian studies in radiation oncology that have been published in the most prestigious cancer journal —The Journal of Clinical Oncology (Liao et al., 2018 and Lin et al, 2020). This brings about the hotly contested issue of subjective priors and much ado has been written about the ability to overcome this problem. Specifically in medicine, one thinks about Spiegelhalter’s classic 1994 paper mentioning reference, clinical, skeptical, or enthusiastic priors who also uses an example from radiation oncology (Spiegelhalter et al., 1994) to make his case. This is nice and all in theory, but what if there is ample evidence that the subject matter experts have major conflicts of interests (COIs) and biases so that their priors cannot be trusted?  A debate raging in oncology, is whether non-invasive radiation therapy is as good as invasive surgery for early stage lung cancer patients. This is a not a trivial question as postoperative morbidity from surgery can range from 19-50% and 90-day mortality anywhere from 0–5% (Chang et al., 2021). Radiation therapy is highly attractive as there are numerous reports hinting at equal efficacy with far less morbidity. Unfortunately, 4 major clinical trials were unable to accrue patients for this important question. Why could they not enroll patients you ask? Long story short, if a patient is referred to radiation oncology and treated with radiation, the surgeon loses out on the revenue, and vice versa. Dr. David Jones, a surgeon at Memorial Sloan Kettering, notes there was no “equipoise among enrolling investigators and medical specialties… Although the reasons are multiple… I believe the primary reason is financial” (Jones, 2015). I am not skirting responsibility for my field’s biases. Dr. Hanbo Chen, a radiation oncologist, notes in his meta-analysis of multiple publications looking at surgery vs radiation that overall survival was associated with the specialty of the first author who published the article (Chen et al, 2018). Perhaps the pen is mightier than the scalpel! Continue reading

Categories: ASA Task Force on Significance and Replicability, Bayesian priors, PhilStat/Med, statistical significance tests | Tags: | 4 Comments

Philip Stark (guest post): commentary on “The Statistics Wars and Intellectual Conflicts of Interest” (Mayo Editorial)

.

Philip B. Stark
Professor
Department of Statistics
University of California, Berkeley

I enjoyed Prof. Mayo’s comment in Conservation Biology Mayo, 2021 very much, and agree enthusiastically with most of it. Here are my key takeaways and reflections.

Error probabilities (or error rates) are essential to consider. If you don’t give thought to what the data would be like if your theory is false, you are not doing science. Some applications really require a decision to be made. Does the drug go to market or not? Are the girders for the bridge strong enough, or not? Hence, banning “bright lines” is silly. Conversely, no threshold for significance, no matter how small, suffices to prove an empirical claim. In replication lies truth. Abandoning P-values exacerbates moral hazard for journal editors, although there has always been moral hazard in the gatekeeping function. Absent any objective assessment of evidence, publication decisions are even more subject to cronyism, “taste”, confirmation bias, etc. Throwing away P-values because many practitioners don’t know how to use them is perverse. It’s like banning scalpels because most people don’t know how to perform surgery. People who wish to perform surgery should be trained in the proper use of scalpels, and those who wish to use statistics should be trained in the proper use of P-values. Throwing out P-values is self-serving to statistical instruction, too: we’re making our lives easier by teaching less instead of teaching better. Continue reading

Categories: ASA Task Force on Significance and Replicability, editorial, multiplicity, P-values | 6 Comments

The ASA controversy on P-values as an illustration of the difficulty of statistics

.

Christian Hennig
Professor
Department of Statistical Sciences
University of Bologna

The ASA controversy on P-values as an illustration of the difficulty of statistics

“I work on Multidimensional Scaling for more than 40 years, and the longer I work on it, the more I realise how much of it I don’t understand. This presentation is about my current state of not understanding.” (John Gower, world leading expert on Multidimensional Scaling, on a conference in 2009)

“The lecturer contradicts herself.” (Student feedback to an ex-colleague for teaching methods and then teaching what problems they have)

1 Limits of understanding

Statistical tests and P-values are widely used and widely misused. In 2016, the ASA issued a statement on significance and P-values with the intention to curb misuse while acknowledging their proper definition and potential use. In my view the statement did a rather good job saying things that are worthwhile saying while trying to be acceptable to those who are generally critical on P-values as well as those who tend to defend their use. As was predictable, the statement did not settle the issue. A “2019 editorial” by some of the authors of the original statement (recommending “to abandon statistical significance”) and a 2021 ASA task force statement, much more positive on P-values, followed, showing the level of disagreement in the profession. Continue reading

Categories: ASA Task Force on Significance and Replicability, Mayo editorial, P-values | 3 Comments

E. Ionides & Ya’acov Ritov (Guest Post) on Mayo’s editorial, “The Statatistics Wars and Intellectual Conflicts of Interest”

.

Edward L. Ionides

.

Director of Undergraduate Programs and Professor,
Department of Statistics, University of Michigan

Ya’acov Ritov Professor
Department of Statistics, University of Michigan

 

Thanks for the clear presentation of the issues at stake in your recent Conservation Biology editorial (Mayo 2021). There is a need for such articles elaborating and contextualizing the ASA President’s Task Force statement on statistical significance (Benjamini et al, 2021). The Benjamini et al (2021) statement is sensible advice that avoids directly addressing the current debate. For better or worse, it has no references, and just speaks what looks to us like plain sense. However, it avoids addressing why there is a debate in the first place, and what are the justifications and misconceptions that drive different positions. Consequently, it may be ineffective at communicating to those swing voters who have sympathies with some of the insinuations in the Wasserstein & Lazar (2016) statement. We say “insinuations” here since we consider that their 2016 statement made an attack on p-values which was forceful, indirect and erroneous. Wasserstein & Lazar (2016) started with a constructive discussion about the uses and abuses of p-values before moving against them. This approach was good rhetoric: “I have come to praise p-values, not to bury them” to invert Shakespeare’s Anthony. Good rhetoric does not always promote good science, but Wasserstein & Lazar (2016) successfully managed to frame and lead the debate, according to Google Scholar. We warned of the potential consequences of that article and its flaws (Ionides et al, 2017) and we refer the reader to our article for more explanation of these issues (it may be found below). Wasserstein, Schirm and Lazar (2019) made their position clearer, and therefore easier to confront. We are grateful to Benjamini et al (2021) and Mayo (2021) for rising to the debate. Rephrasing Churchill in support of their efforts, “Many forms of statistical methods have been tried, and will be tried in this world of sin and woe. No one pretends that the p-value is perfect or all-wise. Indeed (noting that its abuse has much responsibility for the replication crisis) it has been said that the p-value is the worst form of inference except all those other forms that have been tried from time to time”. Continue reading

Categories: ASA Task Force on Significance and Replicability, editors, P-values, significance tests | 2 Comments

B. Haig on questionable editorial directives from Psychological Science (Guest Post)

.

Brian Haig, Professor Emeritus
Department of Psychology
University of Canterbury
Christchurch, New Zealand

 

What do editors of psychology journals think about tests of statistical significance? Questionable editorial directives from Psychological Science

Deborah Mayo’s (2021) recent editorial in Conservation Biology addresses the important issue of how journal editors should deal with strong disagreements about tests of statistical significance (ToSS). Her commentary speaks to applied fields, such as conservation science, but it is relevant to basic research, as well as other sciences, such as psychology. In this short guest commentary, I briefly remark on the role played by the prominent journal, Psychological Science (PS), regarding whether or not researchers should employ ToSS. PS is the flagship journal of the Association for Psychological Science, and two of its editors-in-chief have offered explicit, but questionable, advice on this matter. Continue reading

Categories: ASA Task Force on Significance and Replicability, Brian Haig, editors, significance tests | Tags: | 2 Comments

Invitation to discuss the ASA Task Force on Statistical Significance and Replication

.

The latest salvo in the statistics wars comes in the form of the publication of The ASA Task Force on Statistical Significance and Replicability, appointed by past ASA president Karen Kafadar in November/December 2019. (In the ‘before times’!) Its members are:

Linda Young, (Co-Chair), Xuming He, (Co-Chair) Yoav Benjamini, Dick De Veaux, Bradley Efron, Scott Evans, Mark Glickman, Barry Graubard, Xiao-Li Meng, Vijay Nair, Nancy Reid, Stephen Stigler, Stephen Vardeman, Chris Wikle, Tommy Wright, Karen Kafadar, Ex-officio. (Kafadar 2020)

The full report of this Task Force is in the The Annals of Applied Statistics, and on my blogpost. It begins:

In 2019 the President of the American Statistical Association (ASA) established a task force to address concerns that a 2019 editorial in The American Statistician (an ASA journal) might be mistakenly interpreted as official ASA policy. (The 2019 editorial recommended eliminating the use of “p < 0.05” and “statistically significant” in statistical analysis.) This document is the statement of the task force… (Benjamini et al. 2021)

Continue reading

Categories: 2016 ASA Statement on P-values, ASA Task Force on Significance and Replicability, JSM 2020, National Institute of Statistical Sciences (NISS), statistical significance tests | 3 Comments

Statisticians Rise Up To Defend (error statistical) Hypothesis Testing

.

What is the message conveyed when the board of a professional association X appoints a Task Force intended to dispel the supposition that a position advanced by the Executive Director of association X does not reflect the views of association X on a topic that members of X disagree on? What it says to me is that there is a serious break-down of communication amongst the leadership and membership of that association. So while I’m extremely glad that the ASA appointed the Task Force on Statistical Significance and Replicability in 2019, I’m very sorry that the main reason it was needed was to address concerns that an editorial put forward by the ASA Executive Director (and 2 others) “might be mistakenly interpreted as official ASA policy”. The 2021 Statement of the Task Force (Benjamini et al. 2021) explains:

In 2019 the President of the American Statistical Association (ASA) established a task force to address concerns that a 2019 editorial in The American Statistician (an ASA journal) might be mistakenly interpreted as official ASA policy. (The 2019 editorial recommended eliminating the use of “p < 0.05” and “statistically significant” in statistical analysis.) This document is the statement of the task force…

Continue reading

Categories: ASA Task Force on Significance and Replicability, Schachtman, significance tests

At long last! The ASA President’s Task Force Statement on Statistical Significance and Replicability

The ASA President’s Task Force Statement on Statistical Significance and Replicability has finally been published. It found a home in The Annals of Applied Statistics, after everyone else they looked to–including the ASA itself– refused to publish it.  For background see this post. I’ll comment on it in a later post. There is also an Editorial: Statistical Significance, P-Values, and Replicability by Karen Kafadar. Continue reading

Categories: ASA Task Force on Significance and Replicability

Blog at WordPress.com.