2018 marked 60 years since the famous weighing machine example from Sir David Cox (1958). it is now 61. It’s one of the “chestnuts” in the exhibits of “chestnuts and howlers” in Excursion 3 (Tour II) of my (still) new book Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (SIST, 2018). It’s especially relevant to take this up now, just before we leave 2019, for reasons that will be revealed over the next day or two. For a sneak preview of those reasons, see the “note to the reader” at the end of this post. So, let’s go back to it, with an excerpt from SIST (pp. 170-173). Continue reading
I’m reblogging a post from Christmas past–exactly 7 years ago. Guess what I gave as the number 1 (of 13)
howler well-worn criticism of statistical significance tests, haunting us back in 2012–all of which are put to rest in Mayo and Spanos 2011? Yes, it’s the frightening allegation that statistical significance tests forbid using any background knowledge! The researcher is imagined to start with a “blank slate” in each inquiry (no memories of fallacies past), and then unthinkingly apply a purely formal, automatic, accept-reject machine. What’s newly frightening (in 2019) is the credulity with which this apparition is now being met (by some). I make some new remarks below the post from Christmas past: Continue reading
les stats, c’est moi
When it comes to the statistics wars, leaders of rival tribes sometimes sound as if they believed “les stats, c’est moi”. . So, rather than say they would like to supplement some well-known tenets (e.g., “a statistically significant effect may not be substantively important”) with a new rule that advances their particular preferred language or statistical philosophy, they may simply blurt out: “we take that step here!” followed by whatever rule of language or statistical philosophy they happen to prefer (as if they have just added the new rule to the existing, uncontested tenets). Karan Kefadar, in her last official (December) report as President of the American Statistical Association (ASA), expresses her determination to call out this problem at the ASA itself. (She raised it first in her June article, discussed in my last post.) Continue reading
Mayo writing to Kafadar
I never met Karen Kafadar, the 2019 President of the American Statistical Association (ASA), but the other day I wrote to her in response to a call in her extremely interesting June 2019 President’s Corner: “Statistics and Unintended Consequences“:
- “I welcome your suggestions for how we can communicate the importance of statistical inference and the proper interpretation of p-values to our scientific partners and science journal editors in a way they will understand and appreciate and can use with confidence and comfort—before they change their policies and abandon statistics altogether.”
I only recently came across her call, and I will share my letter below. First, here are some excerpts from her June President’s Corner (her December report is due any day). Continue reading
Professor Andrea Saltelli
Centre for the Study of the Sciences and the Humanities (SVT), University of Bergen (UIB, Norway),
Open Evidence Research, Universitat Oberta de Catalunya (UOC), Barcelona
What can we learn from the debate on statistical significance?
The statistical community is in the midst of crisis whose latest convulsion is a petition to abolish the concept of significance. The problem is perhaps neither with significance, nor with statistics, but with the inconsiderate way we use numbers, and with our present approach to quantification. Unless the crisis is resolved, there will be a loss of consensus in scientific arguments, with a corresponding decline of public trust in the findings of science. Continue reading
cure by committee
Everything is impeach and remove these days! Should that hold also for the concept of statistical significance and P-value thresholds? There’s an active campaign that says yes, but I aver it is doing more harm than good. In my last post, I said I would count the ways it is detrimental until I became “too disconsolate to continue”. There I showed why the new movement, launched by Executive Director of the ASA (American Statistical Association), Ronald Wasserstein (in what I dub ASA II(note)), is self-defeating: it instantiates and encourages the human-all-too-human tendency to exploit researcher flexibility, rewards, and openings for bias in research (F, R & B Hypothesis). That was reason #1. Just reviewing it already fills me with such dismay, that I fear I will become too disconsolate to continue before even getting to reason #2. So let me just quickly jot down reasons #2, 3, 4, and 5 (without full arguments) before I expire. Continue reading
“Before we stood on the edge of the precipice, now we have taken a great step forward”
What’s self-defeating about pursuing statistical reforms in the manner taken by the American Statistical Association (ASA) in 2019? In case you’re not up on the latest in significance testing wars, the 2016 ASA Statement on P-Values and Statistical Significance, ASA I, arguably, was a reasonably consensual statement on the need to avoid some well-known abuses of P-values–notably if you compute P-values, ignoring selective reporting, multiple testing, or stopping when the data look good, the computed P-value will be invalid. (Principle 4, ASA I) But then Ron Wasserstein, executive director of the ASA, and co-editors, decided they weren’t happy with their own 2016 statement because it “stopped just short of recommending that declarations of ‘statistical significance’ be abandoned” altogether. In their new statement–ASA II(note)–they announced: “We take that step here….Statistically significant –don’t say it and don’t use it”.
Why do I say it is a mis-take to have taken the supposed next “great step forward”? Why do I count it as unsuccessful as a piece of statistical science policy? In what ways does it make the situation worse? Let me count the ways. The first is in this post. Others will come in following posts, until I become too disconsolate to continue.[i] Continue reading
This article came out on Monday on our Summer Seminar in Philosophy of Statistics in Virginia Tech News Daily magazine.
October 28, 2019
From universities around the world, participants in a summer session gathered to discuss the merits of the philosophy of statistics. Co-director Deborah Mayo, left, hosted an evening for them at her home.
In Tour II of this first Excursion of Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (SIST, 2018, CUP), I pull back the cover on disagreements between experts charged with restoring integrity to today’s statistical practice. Some advised me to wait until later (in the book) to get to this eye-opener. Granted, the full story involves some technical issues, but after many months, I think I arrived at a way to get to the heart of things informally (with a promise of more detailed retracing of steps later on). It was too important not to reveal right away that some of the most popular “reforms” fall down on the job even with respect to our most minimal principle of evidence (you don’t have evidence for a claim if little if anything has been done to probe the ways it can be flawed). Continue reading
This week marks one year since the general availability of my book: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP). Here’s how it begins (Excursion 1 Tour 1 (1.1)). Material from the preface is here. I will sporadically give some “one year later” reflections in the comments. I invite readers to ask me any questions pertaining to the Tour.
The journey begins..(1.1)
I’m talking about a speciﬁc, extra type of integrity that is [beyond] not lying, but bending over backwards to show how you’re maybe wrong, that you ought to have when acting as a scientist. (Feynman 1974/1985, p. 387)
It is easy to lie with statistics. Or so the cliché goes. It is also very diﬃcult to uncover these lies without statistical methods – at least of the right kind. Self- correcting statistical methods are needed, and, with minimal technical fanfare, that’s what I aim to illuminate. Since Darrell Huﬀ wrote How to Lie with Statistics in 1954, ways of lying with statistics are so well worn as to have emerged in reverberating slogans:
- Association is not causation.
- Statistical signiﬁcance is not substantive signiﬁcamce
- No evidence of risk is not evidence of no risk.
- If you torture the data enough, they will confess.
Mayo banging head
If you were on a committee to highlight issues surrounding P-values and replication, what’s the first definition you would check? Yes, exactly. Apparently, when it came to the recently released National Academies of Science “Consensus Study” Reproducibility and Replicability in Science 2019, no one did. Continue reading
The October 2019 issue of the European Journal of Clinical Investigations came out today. It includes the PERSPECTIVE article by Tom Hardwicke and John Ioannidis, an invited editorial by Gelman and one by me:
Petitions in scientific argumentation: Dissecting the request to retire statistical significance, by Tom Hardwicke and John Ioannidis
When we make recommendations for scientific practice, we are (at best) acting as social scientists, by Andrew Gelman
P-value thresholds: Forfeit at your peril, by Deborah Mayo
I blogged excerpts from my preprint, and some related posts, here.
All agree to the disagreement on the statistical and metastatistical issues: Continue reading
A key recognition among those who write on the statistical crisis in science is that the pressure to publish attention-getting articles can incentivize researchers to produce eye-catching but inadequately scrutinized claims. We may see much the same sensationalism in broadcasting metastatistical research, especially if it takes the form of scapegoating or banning statistical significance. A lot of excitement was generated recently when Ron Wasserstein, Executive Director of the American Statistical Association (ASA), and co-editors A. Schirm and N. Lazar, updated(note) the 2016 ASA Statement on P-Values and Statistical Significance (ASA I). In their 2019 interpretation, ASA I “stopped just short of recommending that declarations of ‘statistical significance’ be abandoned,” and in their new statement (ASA II) announced: “We take that step here….’statistically significant’ –don’t say it and don’t use it”. To herald the ASA II(note), and the special issue “Moving to a world beyond ‘p < 0.05’”, the journal Nature requisitioned a commentary from Amrhein, Greenland and McShane “Retire Statistical Significance” (AGM). With over 800 signatories, the commentary received the imposing title “Scientists rise up against significance tests”! Continue reading
I came across this post on Gelman’s blog today:
Exchange with Deborah Mayo on abandoning statistical significance
It was straight out of blog comments and email correspondence back when the ASA, and significant others, were rising up against the concept of statistical significance. Here it is: Continue reading
Nathan Schachtman (who was a special invited speaker at our recent Summer Seminar in Phil Stat) put up a post on his law blog the other day (“Palavering About P-values”) on an article by a statistics professor at Stanford, Helena Kraemer. “Palavering” is an interesting word choice of Schachtman’s. Its range of meanings is relevant here [i]; in my title, I intend both, in turn. You can read Schachtman’s full post here, it begins like this:
The American Statistical Association’s most recent confused and confusing communication about statistical significance testing has given rise to great mischief in the world of science and science publishing.[ASA II 2019] Take for instance last week’s opinion piece about “Is It Time to Ban the P Value?” Please.
Admittedly, their recent statement, which I refer to as ASA II, has seemed to open the floodgates to some very zany remarks about P-values, their meaning and role in statistical testing. Continuing with Schachtman’s post: Continue reading