“Should Science Abandon Statistical Significance?” Session at AAAS Annual Meeting, Feb 18

Karen Kafadar, Yoav Benjamini, and Donald Macnaughton will be in a session:

Should Science Abandon Statistical Significance?

Friday, Feb 18 from 2-2:45 PM (EST) at the AAAS 2022 annual meeting.

The general program is here. To register*, go to this page.


The concept of statistical significance is central in scientific research. However, the concept is often poorly understood and thus is often unfairly criticized. This presentation includes three independent but overlapping arguments about the usefulness of the concept of statistical significance to reliably detect “effects” in frontline scientific research data. We illustrate the arguments with examples of scientific importance from genomics, physics, and medicine. We explain how the concept of statistical significance provides a cost-efficient objective way to empower scientific research with evidence.

Papers Continue reading

Categories: AAAS, Announcement, statistical significance | Tags: | Leave a comment

January 11 PhilStat Forum: Mayo: “The Stat Wars and Intellectual Conflicts of Interest”

Here are my slides on my Editorial in Conservation Biology: “The Statistics Wars and Intellectual Conflicts of Interest” Mayo (2021)  presented at  the 11 January Phil Stat Forum with speakers: Deborah G. Mayo and Yoav Benjamini and moderator David Hand. (Benjamini’s slides & full Video to come shortly)

D. Mayo                 Y. Benjamini.           D. Hand



For more details on the focus and background readings see this post on the Phil Stat Forum blog or this post January 10 post.

Categories: editors | Tags: , , | Leave a comment

ENBIS Webinar: Statistical Significance and p-values

Yesterday’s event video recording is available at:

European Network for Business and Industrial Statistics (ENBIS) Webinar:
Statistical Significance and p-values
Europe/Amsterdam (CET); 08:00-09:30 am (EST)

ENBIS will dedicate this webinar to the memory of Sir David Cox, who sadly passed away in January 2022.

Continue reading

Categories: Announcement, significance tests, Sir David Cox | Tags: , | 2 Comments

“A [very informal] Conversation Between Sir David Cox & D.G. Mayo”

In June 2011, Sir David Cox agreed to a very informal ‘interview’ on the topics of the 2010 workshop that I co-ran at the London School of Economics (CPNSS), Statistical Science and Philosophy of Science, where he was a speaker. Soon after I began taping, Cox stopped me in order to show me how to do a proper interview. He proceeded to ask me questions, beginning with:

COX: Deborah, in some fields foundations do not seem very important, but we both think foundations of statistical inference are important; why do you think that is?

MAYO: I think because they ask about fundamental questions of evidence, inference, and probability. I don’t think that foundations of different fields are all alike; because in statistics we’re so intimately connected to the scientific interest in learning about the world, we invariably cross into philosophical questions about empirical knowledge and inductive inference.

Continue reading

Categories: Birnbaum, Likelihood Principle, Sir David Cox, StatSci meets PhilSci | Tags: , | Leave a comment

Sir David Cox: An intellectual interview by Nancy Reid

Hinkley, Reid & Cox

Here’s an in-depth interview of Sir David Cox by Nancy Reid that brings out a rare, intellectual understanding and appreciation of some of Cox’s work. Only someone truly in the know could have managed to elicit these fascinating reflections. The interview was in Oct 1993, published in 1994.

Nancy Reid (1994). A Conversation with Sir David Cox, Statistical Science 9(3): 439-455.







Categories: Sir David Cox | Leave a comment

A interview with Sir David Cox by “Statistics Views” (upon turning 90)

Sir David Cox

Sir David Cox: July 15, 1924-Jan 18, 2022

The original Statistics Views interview is here:

“I would like to think of myself as a scientist, who happens largely to specialise in the use of statistics”– An interview with Sir David Cox


  • Author: Statistics Views
  • Date: 24 Jan 2014

Sir David Cox is arguably one of the world’s leading living statisticians. He has made pioneering and important contributions to numerous areas of statistics and applied probability over the years, of which perhaps the best known is the proportional hazards model, which is widely used in the analysis of survival data. The Cox point process was named after him. Continue reading

Categories: Sir David Cox | 4 Comments

Sir David Cox: Significance tests: rethinking the controversy (September 5, 2018 RSS keynote)

Sir David Cox speaking at the RSS meeting in a session: “Significance Tests: Rethinking the Controversy” on 5 September 2018.

Continue reading

Categories: Sir David Cox, statistical significance tests | Tags: | Leave a comment

Sir David Cox

July 15, 1924-January 18, 2022


Categories: Error Statistics | 2 Comments

Nathan Schactman: Of Significance, Error, Confidence, and Confusion – In the Law and In Statistical Practice (Guest Post)


Nathan Schachtman,  Esq., J.D.
Legal Counsel for Scientific Challenges

Of Significance, Error, Confidence, and Confusion – In the Law and In Statistical Practice

The metaphor of law as an “empty vessel” is frequently invoked to describe the law generally, as well as pejoratively to describe lawyers. The metaphor rings true at least in describing how the factual content of legal judgments comes from outside the law. In many varieties of litigation, not only the facts and data, but the scientific and statistical inferences must be added to the “empty vessel” to obtain a correct and meaningful outcome. Continue reading

Categories: ASA Guide to P-values, ASA Task Force on Significance and Replicability, PhilStat Law, Schachtman | 3 Comments

John Park: Poisoned Priors: Will You Drink from This Well?(Guest Post)


John Park, MD
Radiation Oncologist
Kansas City VA Medical Center

Poisoned Priors: Will You Drink from This Well?

As an oncologist, specializing in the field of radiation oncology, “The Statistics Wars and Intellectual Conflicts of Interest”, as Prof. Mayo’s recent editorial is titled, is one of practical importance to me and my patients (Mayo, 2021). Some are flirting with Bayesian statistics to move on from statistical significance testing and the use of P-values. In fact, what many consider the world’s preeminent cancer center, MD Anderson, has a strong Bayesian group that completed 2 early phase Bayesian studies in radiation oncology that have been published in the most prestigious cancer journal —The Journal of Clinical Oncology (Liao et al., 2018 and Lin et al, 2020). This brings about the hotly contested issue of subjective priors and much ado has been written about the ability to overcome this problem. Specifically in medicine, one thinks about Spiegelhalter’s classic 1994 paper mentioning reference, clinical, skeptical, or enthusiastic priors who also uses an example from radiation oncology (Spiegelhalter et al., 1994) to make his case. This is nice and all in theory, but what if there is ample evidence that the subject matter experts have major conflicts of interests (COIs) and biases so that their priors cannot be trusted?  A debate raging in oncology, is whether non-invasive radiation therapy is as good as invasive surgery for early stage lung cancer patients. This is a not a trivial question as postoperative morbidity from surgery can range from 19-50% and 90-day mortality anywhere from 0–5% (Chang et al., 2021). Radiation therapy is highly attractive as there are numerous reports hinting at equal efficacy with far less morbidity. Unfortunately, 4 major clinical trials were unable to accrue patients for this important question. Why could they not enroll patients you ask? Long story short, if a patient is referred to radiation oncology and treated with radiation, the surgeon loses out on the revenue, and vice versa. Dr. David Jones, a surgeon at Memorial Sloan Kettering, notes there was no “equipoise among enrolling investigators and medical specialties… Although the reasons are multiple… I believe the primary reason is financial” (Jones, 2015). I am not skirting responsibility for my field’s biases. Dr. Hanbo Chen, a radiation oncologist, notes in his meta-analysis of multiple publications looking at surgery vs radiation that overall survival was associated with the specialty of the first author who published the article (Chen et al, 2018). Perhaps the pen is mightier than the scalpel! Continue reading

Categories: ASA Task Force on Significance and Replicability, Bayesian priors, PhilStat/Med, statistical significance tests | Tags: | 4 Comments

Brian Dennis: Journal Editors Be Warned:  Statistics Won’t Be Contained (Guest Post)


Brian Dennis

Professor Emeritus
Dept Fish and Wildlife Sciences,
Dept Mathematics and Statistical Science
University of Idaho


Journal Editors Be Warned:  Statistics Won’t Be Contained

I heartily second Professor Mayo’s call, in a recent issue of Conservation Biology, for science journals to tread lightly on prescribing statistical methods (Mayo 2021).  Such prescriptions are not likely to be constructive;  the issues involved are too vast.

The science of ecology has long relied on innovative statistical thinking.  Fisher himself, inventor of P values and a considerable portion of other statistical methods used by generations of ecologists, helped ecologists quantify patterns of biodiversity (Fisher et al. 1943) and understand how genetics and evolution were connected (Fisher 1930).  G. E. Hutchinson, the “founder of modern ecology” (and my professional grandfather), early on helped build the tradition of heavy consumption of mathematics and statistics in ecological research (Slack 2010). Continue reading

Categories: ecology, editors, Likelihood Principle, Royall | Tags: | 4 Comments

Philip Stark (guest post): commentary on “The Statistics Wars and Intellectual Conflicts of Interest” (Mayo Editorial)


Philip B. Stark
Department of Statistics
University of California, Berkeley

I enjoyed Prof. Mayo’s comment in Conservation Biology Mayo, 2021 very much, and agree enthusiastically with most of it. Here are my key takeaways and reflections.

Error probabilities (or error rates) are essential to consider. If you don’t give thought to what the data would be like if your theory is false, you are not doing science. Some applications really require a decision to be made. Does the drug go to market or not? Are the girders for the bridge strong enough, or not? Hence, banning “bright lines” is silly. Conversely, no threshold for significance, no matter how small, suffices to prove an empirical claim. In replication lies truth. Abandoning P-values exacerbates moral hazard for journal editors, although there has always been moral hazard in the gatekeeping function. Absent any objective assessment of evidence, publication decisions are even more subject to cronyism, “taste”, confirmation bias, etc. Throwing away P-values because many practitioners don’t know how to use them is perverse. It’s like banning scalpels because most people don’t know how to perform surgery. People who wish to perform surgery should be trained in the proper use of scalpels, and those who wish to use statistics should be trained in the proper use of P-values. Throwing out P-values is self-serving to statistical instruction, too: we’re making our lives easier by teaching less instead of teaching better. Continue reading

Categories: ASA Task Force on Significance and Replicability, editorial, multiplicity, P-values | 6 Comments

Kent Staley: Commentary on “The statistics wars and intellectual conflicts of interest” (Guest Post)


Kent Staley

Department of Philosophy
Saint Louis University


Commentary on “The statistics wars and intellectual conflicts of interest” (Mayo editorial)

In her recent Editorial for Conservation Biology, Deborah Mayo argues that journal editors “should avoid taking sides” regarding “heated disagreements about statistical significance tests.” Particularly, they should not impose bans suggested by combatants in the “statistics wars” on statistical methods advocated by the opposing side, such as Wasserstein et al.’s (2019) proposed ban on the declaration of statistical significance and use of p value thresholds. Were journal editors to adopt such proposals, Mayo argues, they would be acting under a conflict of interest (COI) of a special kind: an “intellectual” conflict of interest.

Conflicts of interest are worrisome because of the potential for bias. Researchers will no doubt be all too familiar with the institutional/bureaucratic requirement of declaring financial interests. Whether such disclosures provide substantive protections against bias or simply satisfy a “CYA” requirement of administrators, the rationale is that assessment of research outcomes can incorporate information relevant to the question of whether the investigators have arrived at a conclusion that overstates (or even fabricates) the support for a claim, when the acceptance of that claim would financially benefit them. This in turn ought to reduce the temptation of investigators to engage in such inflation or fabrication of support. The idea obviously applies quite naturally to editorial decisions as well as research conclusions. Continue reading

Categories: conflicts of interest, editors, intellectual COI, significance tests, statistical tests | 6 Comments

Yudi Pawitan: Behavioral aspects in the statistical significance war-game(Guest Post)



Yudi Pawitan
Department of Medical Epidemiology and Biostatistics
Karolinska Institutet, Stockholm


Behavioral aspects in the statistical significance war-game

I remember with fondness the good old days when the only ‘statistical war’-game was fought between the Bayesian and the frequentist. It was simpler – except when the likelihood principle is thrown in, always guaranteed to confound the frequentist – and the participants were for the most part collegial. Moreover, there was a feeling that it was a philosophical debate. Even though the Bayesian-frequentist war is not fully settled, we can see areas of consensus, for example in objective Bayesianism or in conditional inference. However, on the P-value and statistical significance front, the war looks less simple as it is about statistical praxis; it is no longer Bayesian vs frequentist, with no consensus in sight and with wide implications affecting the day-to-day use of statistics. Typically, a persistent controversy between otherwise sensible and knowledgeable people – thus excluding anti-vaxxers and conspiracy theorists – might indicate we are missing some common perspectives or perhaps the big picture. In complex issues there can be genuinely distinct aspects about which different players disagree and, at some point, agree to disagree. I am not sure we have reached that point yet, with each side still working to persuade the other side the faults of their position. For now, I can only concur with Mayo (2021)’s appeal that at least the umpires – journals editors – recognize (a) the issue at hand and (b) that genuine debates are still ongoing, so it is not yet time to take sides. Continue reading

Categories: Error Statistics | 8 Comments

January 11: Phil Stat Forum (remote): Statistical Significance Test Anxiety

Special Session of the (remote)
Phil Stat Forum:

11 January 2022

“Statistical Significance Test Anxiety”

TIME: 15:00-17:00 (London, GMT); 10:00-12:00 (EST)

Presenters: Deborah Mayo (Virginia Tech) &
Yoav Benjamini (Tel Aviv University)

Moderator: David Hand (Imperial College London)

Deborah Mayo       Yoav Benjamini        David Hand

Continue reading

Categories: Announcement, David Hand, Phil Stat Forum, significance tests, Yoav Benjamini | Leave a comment

The ASA controversy on P-values as an illustration of the difficulty of statistics


Christian Hennig
Department of Statistical Sciences
University of Bologna

The ASA controversy on P-values as an illustration of the difficulty of statistics

“I work on Multidimensional Scaling for more than 40 years, and the longer I work on it, the more I realise how much of it I don’t understand. This presentation is about my current state of not understanding.” (John Gower, world leading expert on Multidimensional Scaling, on a conference in 2009)

“The lecturer contradicts herself.” (Student feedback to an ex-colleague for teaching methods and then teaching what problems they have)

1 Limits of understanding

Statistical tests and P-values are widely used and widely misused. In 2016, the ASA issued a statement on significance and P-values with the intention to curb misuse while acknowledging their proper definition and potential use. In my view the statement did a rather good job saying things that are worthwhile saying while trying to be acceptable to those who are generally critical on P-values as well as those who tend to defend their use. As was predictable, the statement did not settle the issue. A “2019 editorial” by some of the authors of the original statement (recommending “to abandon statistical significance”) and a 2021 ASA task force statement, much more positive on P-values, followed, showing the level of disagreement in the profession. Continue reading

Categories: ASA Task Force on Significance and Replicability, Mayo editorial, P-values | 3 Comments

E. Ionides & Ya’acov Ritov (Guest Post) on Mayo’s editorial, “The Statatistics Wars and Intellectual Conflicts of Interest”


Edward L. Ionides


Director of Undergraduate Programs and Professor,
Department of Statistics, University of Michigan

Ya’acov Ritov Professor
Department of Statistics, University of Michigan


Thanks for the clear presentation of the issues at stake in your recent Conservation Biology editorial (Mayo 2021). There is a need for such articles elaborating and contextualizing the ASA President’s Task Force statement on statistical significance (Benjamini et al, 2021). The Benjamini et al (2021) statement is sensible advice that avoids directly addressing the current debate. For better or worse, it has no references, and just speaks what looks to us like plain sense. However, it avoids addressing why there is a debate in the first place, and what are the justifications and misconceptions that drive different positions. Consequently, it may be ineffective at communicating to those swing voters who have sympathies with some of the insinuations in the Wasserstein & Lazar (2016) statement. We say “insinuations” here since we consider that their 2016 statement made an attack on p-values which was forceful, indirect and erroneous. Wasserstein & Lazar (2016) started with a constructive discussion about the uses and abuses of p-values before moving against them. This approach was good rhetoric: “I have come to praise p-values, not to bury them” to invert Shakespeare’s Anthony. Good rhetoric does not always promote good science, but Wasserstein & Lazar (2016) successfully managed to frame and lead the debate, according to Google Scholar. We warned of the potential consequences of that article and its flaws (Ionides et al, 2017) and we refer the reader to our article for more explanation of these issues (it may be found below). Wasserstein, Schirm and Lazar (2019) made their position clearer, and therefore easier to confront. We are grateful to Benjamini et al (2021) and Mayo (2021) for rising to the debate. Rephrasing Churchill in support of their efforts, “Many forms of statistical methods have been tried, and will be tried in this world of sin and woe. No one pretends that the p-value is perfect or all-wise. Indeed (noting that its abuse has much responsibility for the replication crisis) it has been said that the p-value is the worst form of inference except all those other forms that have been tried from time to time”. Continue reading

Categories: ASA Task Force on Significance and Replicability, editors, P-values, significance tests | 2 Comments

B. Haig on questionable editorial directives from Psychological Science (Guest Post)


Brian Haig, Professor Emeritus
Department of Psychology
University of Canterbury
Christchurch, New Zealand


What do editors of psychology journals think about tests of statistical significance? Questionable editorial directives from Psychological Science

Deborah Mayo’s (2021) recent editorial in Conservation Biology addresses the important issue of how journal editors should deal with strong disagreements about tests of statistical significance (ToSS). Her commentary speaks to applied fields, such as conservation science, but it is relevant to basic research, as well as other sciences, such as psychology. In this short guest commentary, I briefly remark on the role played by the prominent journal, Psychological Science (PS), regarding whether or not researchers should employ ToSS. PS is the flagship journal of the Association for Psychological Science, and two of its editors-in-chief have offered explicit, but questionable, advice on this matter. Continue reading

Categories: ASA Task Force on Significance and Replicability, Brian Haig, editors, significance tests | Tags: | 2 Comments

D. Lakens (Guest Post): Averting journal editors from making fools of themselves


Daniël Lakens

Associate Professor
Human Technology Interaction
Eindhoven University of Technology

Averting journal editors from making fools of themselves

In a recent editorial, Mayo (2021) warns journal editors to avoid calls for authors guidelines to reflect a particular statistical philosophy, and not to go beyond merely enforcing the proper use of significance tests. That such a warning is needed at all should embarrass anyone working in statistics. And yet, a mere three weeks after Mayo’s editorial was published, the need for such warnings was reinforced when a co-editorial by journal editors from the International Society of Physiotherapy (Elkins et al., 2021) titled “Statistical inference through estimation: recommendations from the International Society of Physiotherapy Journal Editors” stated: “[This editorial] also advises researchers that some physiotherapy journals that are members of the International Society of Physiotherapy Journal Editors (ISPJE) will be expecting manuscripts to use estimation methods instead of null hypothesis statistical tests.” Continue reading

Categories: D. Lakens, significance tests | 4 Comments

Midnight With Birnbaum (Remote, Virtual Happy New Year 2021)!


.For the second year in a row, unlike the previous 9 years that I’ve been blogging, it’s not feasible to actually revisit that spot in the road, looking to get into a strange-looking taxi, to head to “Midnight With Birnbaum”.  Because of the extended pandemic, I am not going out this New Year’s Eve again, so the best I can hope for is a zoom link of the sort I received last year, not long before midnight– that will link me to a hypothetical party with him. (The pic on the left is the only blurry image I have of the club I’m taken to.) I just keep watching my email, to see if a zoom link arrives. My book Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (CUP, 2018)  doesn’t include the argument from my article in Statistical Science (“On the Birnbaum Argument for the Strong Likelihood Principle”), but you can read it at that link along with commentaries by A. P. David, Michael Evans, Martin and Liu, D. A. S. Fraser (who sadly passed away in 2021), Jan Hannig, and Jan Bjornstad  but there’s much in it that I’d like to discuss with him. The (Strong) Likelihood Principle (LP or SLP)–whether or not it is named–remains at the heart of many of the criticisms of Neyman-Pearson (N-P) statistics and statistical significance testing in general. Continue reading

Categories: Birnbaum, Birnbaum Brakes, strong likelihood principle | Tags: , , , | 1 Comment

Blog at WordPress.com.