Error Statistics Philosophy

5-year Review: P-Value Statements and Their Unintended(?) Consequences: The June 2019 ASA President’s Corner (b)

Posted on May 10, 2024 by Mayo

I continue my 5-year review of some highlights from the “abandon significance” movement from 2019. This post was first published on this blog on November 30, 2019, It was based on a call by then American Statistical Association President, Karen Kafadar, which sparked a counter-movement. I will soon begin sharing a few invited guest posts reflecting on current thinking either on the episode or on statistical methodology more generally. I may continue to post such reflections over the summer, as they come in, so let me know if you’d like to contribute something. Share your thoughts in the comments.

Mayo writing to Kafadar

I never met Karen Kafadar, the 2019 President of the American Statistical Association (ASA), but the other day I wrote to her in response to a call in her extremely interesting June 2019 President’s Corner: “Statistics and Unintended Consequences“:

“I welcome your suggestions for how we can communicate the importance of statistical inference and the proper interpretation of p-values to our scientific partners and science journal editors in a way they will understand and appreciate and can use with confidence and comfort—before they change their policies and abandon statistics altogether.”

I only recently came across her call, and I will share my letter below. First, here are some excerpts from her June President’s Corner (her December report is due any day).

Recently, at chapter meetings, conferences, and other events, I’ve had the good fortune to meet many of our members, many of whom feel queasy about the effects of differing views on p-values expressed in the March 2019 supplement of The American Statistician (TAS). The guest editors— Ronald Wasserstein, Allen Schirm, and Nicole Lazar—introduced the ASA Statement on P-Values (2016) by stating the obvious: “Let us be clear. Nothing in the ASA statement is new.” Indeed, the six principles are well-known to statisticians.The guest editors continued, “We hoped that a statement from the world’s largest professional association of statisticians would open a fresh discussion and draw renewed and vigorous attention to changing the practice of science with regards to the use of statistical inference.”…

Wait a minute. I’m confused about who is speaking. The statements “Let us be clear…” and “We hoped that a statement from the world’s largest professional association…” come from the 2016 ASA Statement on P-values. I abbreviate this as ASA I (Wasserstein and Lazar 2016). The March 2019 editorial that Kafadar says is making many members “feel queasy,” is the update (Wasserstein, Schirm, and Lazar 2019). I abbreviate it as ASA II [i].^(note)

A healthy debate about statistical approaches can lead to better methods. But, just as Wilks and his colleagues discovered, unintended consequences may have arisen: Nonstatisticians (the target of the issue) may be confused about what to do. Worse, “by breaking free from the bonds of statistical significance” as the editors suggest and several authors urge, researchers may read the call to “abandon statistical significance” as “abandon statistical methods altogether.” …

But we may need more. How exactly are researchers supposed to implement this “new concept” of statistical thinking? Without specifics, questions such as “Why is getting rid of p-values so hard?” may lead some of our scientific colleagues to hear the message as, “Abandon p-values”—despite the guest editors’ statement: “We are not recommending that the calculation and use of continuous p-values be discontinued.”

Brad Efron once said, “Those who ignore statistics are condemned to re-invent it.” In his commentary (“It’s not the p-value’s fault”) following the 2016 ASA Statement on P-Values, Yoav Benjamini wrote, “The ASA Board statement about the p-values may be read as discouraging the use of p-values because they can be misused, while the other approaches offered there might be misused in much the same way.” Indeed, p-values (and all statistical methods in general) can be misused. (So may cars and computers and cell phones and alcohol. Even words in the English language get misused!) But banishing them will not prevent misuse; analysts will simply find other ways to document a point—perhaps better ways, but perhaps less reliable ones. And, as Benjamini further writes, p-values have stood the test of time in part because they offer “a first line of defense against being fooled by randomness, separating signal from noise, because the models it requires are simpler than any other statistical tool needs”—especially now that Efron’s bootstrap has become a familiar tool in all branches of science for characterizing uncertainty in statistical estimates.[Benjamini is commenting on ASA I.]

… It is reassuring that “Nature is not seeking to change how it considers statistical evaluation of papers at this time,” but this line is buried in its March 20 editorial, titled “It’s Time to Talk About Ditching Statistical Significance.” Which sentence do you think will be more memorable? We can wait to see if other journals follow BASP’s lead and then respond. But then we’re back to “reactive” versus “proactive” mode (see February’s column), which is how we got here in the first place.

… Indeed, the ASA has a professional responsibility to ensure good science is conducted—and statistical inference is an essential part of good science. Given the confusion in the scientific community (to which the ASA’s peer-reviewed 2019 TAS supplement may have unintentionally contributed), we cannot afford to sit back. After all, that’s what started us down the “abuse of p-values” path.

Is it unintentional? [ii]

…Tukey wrote years ago about Bayesian methods: “It is relatively clear that discarding Bayesian techniques would be a real mistake; trying to use them everywhere, however, would in my judgment, be a considerably greater mistake.” In the present context, perhaps he might have said: “It is relatively clear that trusting or dismissing results based on a single p-value would be a real mistake; discarding p-values entirely, however, would in my judgment, be a considerably greater mistake.” We should take responsibility for the situation in which we find ourselves today (and during the past decades) to ensure that our well-researched and theoretically sound statistical methodology is neither abused nor dismissed categorically. I welcome your suggestions for how we can communicate the importance of statistical inference and the proper interpretation of p-values to our scientific partners and science journal editors in a way they will understand and appreciate and can use with confidence and comfort—before they change their policies and abandon statistics altogether. Please send me your ideas!

You can read the full June President’s Corner.

On Fri, Nov 8, 2019 at 2:09 PM Deborah Mayo <mayod@vt.edu> wrote:

Dear Professor Kafadar;

Your article in the President’s Corner of the ASA for June 2019 was sent to me by someone who had read my “P-value Thresholds: Forfeit at your Peril” editorial, invited by John Ioannidis. I find your sentiment welcome and I’m responding to your call for suggestions.

For starters, when representatives of the ASA issue articles criticizing P-values and significance tests, recommending their supplementation or replacement by others, three very simple principles should be followed:

The elements of tests should be presented in an accurate, fair and at least reasonably generous manner, rather than presenting mainly abuses of the methods;
The latest accepted methods should be included, not just crude nil null hypothesis tests. How these newer methods get around the often-repeated problems should be mentioned.
Problems facing the better-known alternatives, recommended as replacements or supplements to significance tests, should be discussed. Such an evaluation should recognize the role of statistical falsification is distinct from (while complementary to) using probability to quantify degrees of confirmation, support, plausibility or belief in a statistical hypothesis or model.

Here’s what I recommend ASA do now in order to correct the distorted picture that is now widespread and growing: Run a conference akin to the one Wasserstein ran on “A World Beyond ‘P < 0.05′” except that it would be on evaluating some competing methods for statistical inference: Comparative Methods of Statistical Inference: Problems and Prospects.

The workshop would consist of serious critical discussions on Bayes Factors, confidence intervals[iii], Likelihoodist methods, other Bayesian approaches (subjective, default non-subjective, empirical), particularly in relation to today’s replication crisis. …

Growth of the use of these alternative methods have been sufficiently widespread to have garnered discussions on well-known problems….The conference I’m describing will easily attract the leading statisticians in the world. …

Sincerely,
D. Mayo

Please share your comments on this blogpost.

************************************

[i] My reference to ASA II^(note) refers just to the portion of the editorial encompassing their general recommendations: don’t say significance or significant, oust P-value thresholds. (It mostly encompasses the first 10 pages.) It begins with a review of 4 of the 6 principles from ASA I, even though they are stated in more extreme terms than in ASA I. (As I point out in my blogpost, the result is to give us principles that are in tension with the original 6.) Note my new qualification in [ii]*

[ii]*As soon as I saw the 2019 document, I queried Wasserstein as to the relationship between ASA I and II^(note). It was never clarified. I hope now that it will be, with some kind of disclaimer. That will help, but merely noting that it never came to a Board vote will not quell the confusion now rattling some ASA members. The ASA’s P-value campaign to editors to revise their author guidelines asks them to take account of both ASA I and II^(note). In carrying out the P-value campaign, at which he is highly effective, Ron Wasserstein obviously* wears his Executive Director’s hat. See The ASA’s P-value Project: Why It’s Doing More Harm than Good. So, until some kind of clarification is issued by the ASA, I’ve hit upon this solution.

The ASA P-value Project existed before the 2016 ASA I. The only difference in today’s P-value Project–since the March 20, 2019 editorial by Wasserstein et al– is that the ASA Executive Director (in talks, presentations, correspondence) recommends ASA I and the general stipulations of ASA II^(note)–even though that’s not a policy document. I will now call it the 2019 ASA P-value Project II. It also includes the rather stronger principles in ASA II^(note). Even many who entirely agree with the “don’t say significance” and “don’t use P-value thresholds” recommendations have concurred with my “friendly amendments” to ASA II^(note) (including, for example, Greenland, Hurlbert, and others). See my post from June 17, 2019.

You merely have to look at the comments to that blog. If Wasserstein would make those slight revisions, the 2019 P-value Project II wouldn’t contain the inconsistencies, or at least “tensions” that it now does, assuming that it retains ASA I. The 2019 ASA P-value Project II sanctions making the recommendations in ASA II^(note), even though ASA II^(note) is not an ASA policy statement.

However, I don’t see that those made queasy by ASA II^(note) would be any less upset with the reality of the ASA P-value Project II.

[iii]Confidence intervals (CIs) clearly aren’t “alternative measures of evidence” in relation to statistical significance tests. The same man, Neyman, developed tests (with Pearson) and CIs, even earlier ~1930. They were developed as duals, or inversions, of tests. Yet the advocates of CIs–the CI Crusaders, S. Hurlbert calls them–are some of today’s harshest and most ungenerous critics of tests. For these crusaders, it has to be “CIs only”. Supplementing p-values with CIs isn’t good enough. Now look what’s happened to CIS in the latest guidelines of the NEJM. You can readily find them searching NEJM on this blog. (My own favored measure, severity, improves on CIs, moves away from the fixed confidence level, and provides a different assessment corresponding to each point in the CI.

*Or is it not obvious? I think it is, because he is invited and speaks, writes, and corresponds in that capacity.

Wasserstein, R. & Lazar, N. (2016) [ASA I], The ASA’s Statement on p-Values: Context, Process, and Purpose”. Volume 70, 2016 – Issue 2.

Wasserstein, R., Schirm, A. and Lazar, N. (2019) [ASA II^(note)] “Moving to a World Beyond ‘p < 0.05’”, The American Statistician 73(S1): 1-19: Editorial.(pdf)

Related posts on ASA II^(note):

June 17, 2019: “The 2019 ASA Guide to P-values and Statistical Significance: Don’t Say What You Don’t Mean” (Some Recommendations)
July 12, 2019: B. Haig: The ASA’s 2019 update on P-values and significance (ASA II^(note))(Guest Post)
July 19, 2019: The NEJM Issues New Guidelines on Statistical Reporting: Is the ASA P-Value Project Backfiring?
September 19, 2019: (Excerpts from) ‘P-Value Thresholds: Forfeit at Your Peril’ (free access). The article by Hardwicke and Ioannidis (2019), and the editorials by Gelman and by me are linked on this post.
Nov 4, 2019. On some Self-defeating aspects of the ASA’s 2019 recommendations of statistical significance tests
Nov 22. The ASA’s P-value Project: Why It’s Doing More Harm than Good.

Related book (excerpts from posts on this blog are collected here)

Mayo, (2018). Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars, SIST (2018, CUP).

5-year review: Hardwicke and Ioannidis, Gelman, and Mayo: P-values: Petitions, Practice, and Perils

Posted on May 4, 2024 by Mayo

Soon after the Wasserstein et al (2019) “don’t say significance” editorial, John Ioannidis invited Andrew Gelman and I to write editorials from our different perspectives on an associated editorial that Nature invited. It was written by Amrhein, Greenland and McShane (AGM, 2019). Prior to the publication of AGM 2019, people were given the opportunity to add their names to the Nature article.

A campaign followed that aimed at the collection of signatures in what was called a ‘petition’ on the widely popular blogsite of Andrew Gelman. Ultimately, 854 scientists signed the petition and the list of their names was published along with commentary. (Hardwicke and Ioannidis, 2019, p. 2)

Tom Hardwicke and John Ioannidis (2019) took advantage of the opportunity “to perform a survey of the signatories to understand how and why they signed the endorsement” (ibid.). This post, reblogged from September 25 2019, includes all 3 articles: the survey by Hardwicke and Ioannidis, and the editorials by Gelman and I. They appeared in the European Journal of Clinical Investigations (2019). I’m still interested in reader responses (in the comments) to the question I pose.

********************

The October 2019 issue of the European Journal of Clinical Investigations came out today. It includes the PERSPECTIVE article by Tom Hardwicke and John Ioannidis, an invited editorial by Gelman and one by me:

Petitions in scientific argumentation: Dissecting the request to retire statistical significance, by Tom Hardwicke and John Ioannidis

When we make recommendations for scientific practice, we are (at best) acting as social scientists, by Andrew Gelman

P-value thresholds: Forfeit at your peril, by Deborah Mayo

I blogged excerpts from my preprint, and some related posts, here.

All agree to the disagreement on the statistical and metastatistical issues:

“Very different views have been expressed, and consensus is distinctly lacking among experts (eg see 21 heterogeneous commentaries accompanying the American Statistical Association’s 2016 Statement on P‐Values [ASAI])” Hardwicke and Ioannidis.
“The foundations and the practice of statistics are in turmoil, with corresponding threads of argument in biology, economics, political science, psychology, public health, and other fields that rely on quantitative research in the presence of variation and uncertainty. Lots of people (myself included) have strong opinions on what should not be done, without always being clear on the best remedy” Gelman.
“The 43 papers in the special issue ‘Moving to a world beyond ‘p < 0.05’’ offer a cacophony of competing reforms” Mayo.

Despite the admitted disparate views, ASA representatives come out, in 2019, forcefully on the side of: Don’t use P-value thresholds (“at all”) in interpreting data, and Never describe results as attaining “statistical significance at level p”. Should the ASA, as an umbrella group, be striving to provide a relatively neutral forum for open, pressure-free, discussion of different methods–their pros and cons? This is a leading question, true. As an outsider, I’m interested to know what both insiders and outsiders think.[i]

[i] It’s hard to imagine the American Philosophical Association coming out with a recommendation against one way of doing philosophy, but of course the situation with statistics is very different. (I do recall a push for “pluralism” in philosophy, which has taken on many meanings, and which I’m not up on.)

Links to ASA I and II^note:

Wasserstein, and N. Lazar. 2016 ASA Statement on P-Values and Statistical Significance (ASA I).

Wasserstein, R., Schirm A. and N. Lazar. “Moving to a world beyond ‘p< 0.05‘” (ASA II)^note

Categories: 5-year memory lane, abandon statistical significance | Leave a comment

5-year Review: B. Haig: [TAS] 2019 update on P-values and significance (ASA II)(Guest Post)

Posted on April 23, 2024 by Mayo

This is the guest post by Bran Haig on July 12, 2019 in response to the “abandon statistical significance” editorial in The American Statistician (TAS) by Wasserstein, Schirm, and Lazar (WSL 2019). In the post it is referred to as ASAII with a note added once we learned that it is actually not a continuation of the 2016 ASA policy statement. (I decided to leave it that way, as otherwise the context seems lost. But in the title to this post, I refer to the journal TAS.) Brian lists some of the benefits that were to result from abandoning statistical significance. I welcome your constructive thoughts in the comments.

Brian Haig, Professor Emeritus
Department of Psychology
University of Canterbury
Christchurch, New Zealand

The American Statistical Association’s (ASA)^(note) recent effort to advise the statistical and scientific communities on how they should think about statistics in research is ambitious in scope. It is concerned with an initial attempt to depict what empirical research might look like in “a world beyond p<0.05” (The American Statistician, 2019, 73, S1,1-401). Quite surprisingly, the main recommendation of the lead editorial article in the Special Issue of The American Statistician devoted to this topic (Wasserstein, Schirm, & Lazar, 2019; hereafter, ASA II^(note)) is that “it is time to stop using the term ‘statistically significant’ entirely”. (p.2) ASA II^(note) acknowledges the controversial nature of this directive and anticipates that it will be subject to critical examination. Indeed, in a recent post, Deborah Mayo began her evaluation of ASA II^(note) by making constructive amendments (reblogged) to three recommendations that appear early in the document (‘Error Statistics Philosophy’, June 17, 2019). These amendments have received numerous endorsements, and I record mine here. In this short commentary, I briefly state a number of general reservations that I have about ASA II^(note).

1. The proposal that we should stop using the expression “statistical significance” is given a weak justification

ASA II^(note) proposes a superficial linguistic reform that is unlikely to overcome the widespread misconceptions and misuse of the concept of significance testing. A more reasonable, and common-sense, strategy would be to diagnose the reasons for the misconceptions and misuse and take ameliorative action through the provision of better statistics education, much as ASA I did with p values. Interestingly, ASA II^(note) references Mayo’s recent book, Statistical Inference as Severe Testing (2018), when mentioning the “statistics wars”. However, it refrains from considering the fact that her error-statistical perspective provides an informed justification for continuing to use tests of significance, along with the expression, “statistically significant”. Further, ASA II^(note) reports cases where some of the Special Issue authors thought that use of a p-value threshold might be acceptable. However, it makes no effort to consider how these cases might challenge their main recommendation.

2. The claimed benefits of abandoning talk of statistical significance are hopeful conjectures.

ASA II^(note) makes a number of claims about the benefits that it thinks will follow from abandoning talk of statistical significance. It says,“researchers will see their results more easily replicated – and, even when not, they will better understand why”. “[We] will begin to see fewer false alarms [and] fewer overlooked discoveries …”. And, “As ‘statistical significance’ is used less, statistical thinking will be used more.” (p.1) I do not believe that any of these claims are likely to follow from retirement of the expression, “statistical significance”. Unfortunately, no justification is provided for the plausibility of any of the alleged benefits. To take two of these claims: First, removal of the common expression, “significance testing” will make little difference to the success rate of replications. It is well known that successful replications depend on a number of important factors, including research design, data quality, effect size, and study power, along with the multiple criteria often invoked in ascertaining replication success. Second, it is just implausible to suggest that refraining from talk about statistical significance will appreciably help overcome mechanical decision-making in statistical practice, and lead to a greater engagement with statistical thinking. Such an outcome will require, among other things, the implementation of science education reforms that centre on the conceptual foundations of statistical inference.

3. ASA II’s^(note) main recommendation is not a majority view.

ASA II^(note) bases its main recommendation to stop using the language of “statistical significance” in good part on its review of the articles in the Special Issue. However, an inspection of the Special Issue reveals that this recommendation is at variance with the views of many of the 40-odd articles it contains. Those articles range widely over topics covered, and attitudes to, the usefulness of tests of significance. By my reckoning, only two of the articles advocate banning talk of significance testing. To be fair, ASA II^(note) acknowledges the diversity of views held about the nature of tests of significance. However, I think that this diversity should have prompted it to take proper account of the fact that its recommendation is only one of a number of alternative views about significance testing. At the very least, ASA II^(note) should have tempered its strong recommendation not to speak of statistical significance any more.

4.The claim for continuity between ASA I and ASA II^(note) is misleading. There is no evidence in ASA I (Wasserstein & Lazar, 2016) for the assertion made in ASA II^(note) that the earlier document stopped just short of recommending that claims of “statistical significance” should be eliminated. In fact, ASA II^(note) marks a clear departure from ASA I, which was essentially concerned with how to better understand and use p-values. There is nothing in the earlier document to suggest that abandoning talk of statistical significance might be the next appropriate step forward in the ASA’s efforts to guide our statistical thinking.

5. Nothing is said about scientific method, and little is said about science.

The announcement of the ASA’s 2017 Symposium on Statistical Inference stated that the Symposium would “focus on specific approaches for advancing scientific methods in the 21^stcentury”. However, the Symposium, and the resulting Special Issue of The American Statistician, showed little interest in matters to do with scientific method. This is regrettable because the myriad insights about scientific inquiry contained in contemporary scientific methodology have the potential to greatly enrich statistical science. The post-p< 0.05 world depicted by ASA II^(note) is not an informed scientific world. It is an important truism that statistical inference plays a major role in scientific reasoning. However, for this role to be properly conveyed, ASA II^(note) would have to employ an informative conception of the nature of scientific inquiry.

6. Scientists who speak of statistical significance do embrace uncertainty. I think that it is uncharitable, indeed incorrect, of ASA II^(note) to depict many researchers who use the language of significance testing as being engaged in a quest for certainty. John Dewey, Charles Peirce, and Karl Popper taught us quite some time ago that we are fallible, error-prone creatures, and that we must embrace uncertainty. Further, despite their limitations, our science education efforts frequently instruct learners to think of uncertainty as an appropriate epistemic attitude to hold in science. This fact, combined with the oft-made claim that statistics employs ideas about probability in order to quantify uncertainty, requires from ASA II^(note) a factually-based justification for its claim that many scientists who employ tests of significance do so in a quest for certainty.

Under the heading, “Has the American Statistical Association Gone Post-Modern?”, the legal scholar, Nathan Schachtman, recently stated:

The ASA may claim to be agnostic in the face of the contradictory recommendations, but there is one thing we know for sure: over-reaching litigants and their expert witnesses will exploit the real or apparent chaos in the ASA’s approach. The lack of coherent, consistent guidance will launch a thousand litigation ships, with no epistemic compass.(‘Schachtman’s Law’, March 24, 2019)

I suggest that, with appropriate adjustment, the same can fairly be said about researchers and statisticians, who might look to ASA II^(note) as an informative guide to a better understanding of tests of significance, and the many misconceptions about them that need to be corrected.

References

Haig, B. D. (2019). Stats: Don’t retire significance testing. Nature, 569, 487.

Mayo, D. G. (2019). The 2019 ASA Guide to P-values and Statistical Significance: Don’t Say What You Don’t Mean (Some Recommendations)(ii),blog post on Error Statistics Philosophy Blog, June 17, 2019.

Mayo, D. G. (2018). Statistical inference as severe testing: How to get beyond the statistics wars. New York, NY: Cambridge University Press.

Wasserstein, R. L., & Lazar, N. A. (2016). The ASA’s statement on p-values: Context, process, and purpose. The American Statistician, 70, 129-133.

Wasserstein, R. L., Schirm. A. L., & Lazar, N. A. (2019). Editorial: Moving to a world beyond “p<0.05”. The American Statistician, 73, S1, 1-19.

Categories: 5-year memory lane, abandon statistical significance, ASA Guide to P-values, Brian Haig | Tags: Brian Haig | Leave a comment

5-year review: The NEJM Issues New Guidelines on Statistical Reporting: Is the ASA P-Value Project Backfiring? (i)

Posted on April 19, 2024 by Mayo

In a July 19, 2019 post I discussed The New England Journal of Medicine’s response to Wasserstein’s (2019) call for journals to change their guidelines in reaction to the “abandon significance” drive. The NEJM said “no thanks” [A]. However confidence intervals CIs got hurt in the mix. In this reblog, I kept the reference to “ASA II” with a note, because that best conveys the context of the discussion at the time. Switching it to WSL (2019) just didn’t read right. I invite your comments. Continue reading →

Categories: 5-year memory lane, abandon statistical significance, ASA Guide to P-values | 6 Comments

5-year review: Don’t let the tail wag the dog by being overly influenced by flawed statistical inferences

Posted on April 17, 2024 by Mayo

On June 1, 2019, I posted portions of an article [i],“There is Still a Place for Significance Testing in Clinical Trials,” in Clinical Trials responding to the 2019 call to abandon significance. I reblog it here. While very short, it effectively responds to the 2019 movement (by some) to abandon the concept of statistical significance [ii]. I have recently been involved in researching drug trials for a condition of a family member, and I can say that I’m extremely grateful that they are still reporting error statistical assessments of new treatments, and using carefully designed statistical significance tests with thresholds. Without them, I think we’d be lost in a sea of potential treatments and clinical trials. Please share any of your own experiences in the comments. The emphasis in this excerpt is mine:

Much hand-wringing has been stimulated by the reflection that reports of clinical studies often misinterpret and misrepresent the findings of the statistical analyses. Recent proposals to address these concerns have included abandoning p-values and much of the traditional classical approach to statistical inference, or dropping the concept of statistical significance while still allowing some place for p-values. How should we in the clinical trials community respond to these concerns? Responses may vary from bemusement, pity for our colleagues working in the wilderness outside the relatively protected environment of clinical trials, to unease about the implications for those of us engaged in clinical trials…. Continue reading →

Categories: 5-year memory lane, abandon statistical significance, statistical tests | 9 Comments

My 2019 friendly amendments to that “abandon significance” editorial

Posted on April 5, 2024 by Mayo

It was 3 months before I decided to write a blogpost in response to Wasserstein, Schirm and Lazar (2019)’s editorial in The American Statistician in which they recommend that the concept of “statistical significance” be abandoned, hereafter, WSL 2019. (I titled it “Don’t Say What You don’t Mean”.) In that June 17, 2019 blogpost, pasted below, I proposed 3 “friendly amendments” to the language of that document. (There are 97 comments on that post!) The problem is that WSL 2019 presents several of the 6 principles from ASA I (the 2016 ASA statement on Statistical Significance) in a far stronger fashion so as to be inconsistent or at least in tension with some of them. I didn’t think they really meant what they said. I discussed these amendments with Ron Wasserstein, Executive Director of the ASA at the time. Had these friendly amendments been carried out, the document would not have caused as much of a problem, and people might focus more on the positive recommendations it includes about scientific integrity. The proposed ban on a key concept of statistics would still be problematic, resulting in the 2019 ASA President’s Task Force, but it would have helped the document. At the time, it was still not known whether WSL 2019 was intended as a continuation of the 2016 ASA policy document [ASA I]. That explains why I first referred to WSL 2019 in this blogpost as ASA II. Once it was revealed that it was not official policy at all (many months later), but only the recommendations of the 3 authors, I placed a “note” after each mention of ASA II. But given it caused sufficient confusion as to result in the then ASA president (Karen Kafadar) appointing an ASA Task Force on Statistical Significance and Replicability in 2019 (see here and here), and later, a disclaimer by the authors, in this reblog I refer to it as WSL 2019. You can search this blog for other posts on the 2019 Task Force: their report is here, and the disclaimer here. Continue reading →

Categories: 2016 ASA Statement on P-values, ASA Guide to P-values, ASA Task Force on Significance and Replicability | Leave a comment

5 years ago today, March 20, 2019: the Start of “Abandon Significance”

Posted on March 20, 2024 by Mayo

Five years ago on this day, a news correspondent at NPR, Richard Harris, published this article, the same day as Wasserstein et al., (2019). Moving to a world beyond “p < 0.05”. TAS, and Amrhein et al., (2019), Comment: Retire statistical significance. Nature. I was one of several people Harris interviewed for his article. He starts by talking of flip-flops regarding the healthfulness of eggs.

Statisticians say it may not be wise to put all your eggs in the significance basket.

A recent study that questioned the healthfulness of eggs raised a perpetual question: Why do studies, as has been the case with health research involving eggs, so often flip-flop from one answer to another? Continue reading →

Categories: stat wars and their casualties, statistical significance | Leave a comment

Preregistration, promises and pitfalls, continued v2

Posted on March 17, 2024 by Mayo

In my last post, I sketched some first remarks I would have made had I been able to travel to London to fulfill my invitation to speak at a Royal Society conference, March 4 and 5, 2024, on “the promises and pitfalls of preregistration.” This is a continuation. It’s a welcome consequence of today’s statistical crisis of replication that some social sciences are taking a page from medical trials and calling for preregistration of sampling protocols and full reporting. In 2018, Brian Nosek and others wrote of the “Preregistration Revolution”, as part of open science initiatives. Continue reading →

Categories: Bayesian/frequentist, Likelihood Principle, preregistration, Severity | 3 Comments

Promises and Pitfalls of Preregistration: A Royal Society conference I was to speak at

Posted on March 6, 2024 by Mayo

I had been invited to speak at a Royal Society meeting, held March 4 and 5, 2024, on “the promises and pitfalls of preregistration”—a topic in which I’m keenly interested. The meeting was organized by Dr Tom Hardwicke, Professor Marcus Munafò, Dr Sophia Crüwell, Professor Dorothy Bishop FRS FMedSci, and Professor Eric-Jan Wagenmakers. Unfortunately, I was unable to travel to London, so I had to decline attending a few months ago. But, I thought I might jot down some remarks here. Continue reading →

Categories: predesignation | 4 Comments

Happy Birthday R.A. Fisher: “Statistical methods and Scientific Induction” with replies by Neyman and E.S. Pearson

Posted on February 17, 2024 by Mayo

17 Feb 1890-29 July 1962

Today is R.A. Fisher’s birthday! I am reblogging what I call the “Triad”–an exchange between Fisher, Neyman and Pearson (N-P) published 20 years after the Fisher-Neyman break-up. While my favorite is still the reply by E.S. Pearson, which alone should have shattered Fisher’s allegations that N-P “reinterpret” tests of significance as “some kind of acceptance procedure”, all three are chock full of gems for different reasons. They are short and worth rereading. Neyman’s article pulls back the cover on what is really behind Fisher’s over-the-top polemics, what with Russian 5-year plans and commercialism in the U.S. Not only is Fisher jealous that N-P tests came to overshadow “his” tests, he is furious at Neyman for driving home the fact that Fisher’s fiducial approach had been shown to be inconsistent (by others). The flaw is illustrated by Neyman in his portion of the triad. Details may be found in my book, SIST (2018) especially pp 388-392 linked to here. It speaks to a common fallacy seen every day in interpreting confidence intervals. As for Neyman’s “behaviorism”, Pearson’s last sentence is revealing.

HAPPY BIRTHDAY R.A. FISHER! Continue reading →

Categories: E.S. Pearson, Fisher, Neyman, phil/history of stat | 1 Comment

Conference: Is Philosophy Useful for Science, and/or Vice Versa? (Jan 30- Feb 2, 2024)

Posted on January 31, 2024 by Mayo

I will be giving an online talk on Friday, Feb 2, 4:30-5:45 NYC time, at a conference you can watch on zoom this week (Jan 30-Feb 2): Is Philosophy Useful for Science, and/or Vice Versa? It’s taking place in-person and online at Chapman University. My talk is: “The importance of philosophy of science for Statistical Science and vice versa”. I’ll touch on a current paper I’m writing that (finally) gets back to “Bayesian conceptions of severity”, (in contrast to error statistical severity) as begun on the post on Van Dongen, Springer, and Wagenmaker (2022). Continue reading →

Categories: Announcement | Leave a comment

Friends of David R. Cox (2022)

Posted on January 10, 2024 by Mayo

$8,765.53 X 2

I want to extend my warmest thanks to all who became Friends of David R. Cox in 2022. Your generous donations to the David R. Cox Foundations of Statistics Award are honoring the contributions of David R. Cox, and promoting the importance of statistical foundations in the American Statistical Association (ASA):

Karim Abadir, Heather Battey, Yoav Benjamini, Stuart Bevan, Alex Blocker, John Bibby, Lynne Billard, Sheila M. Bird, William Browning, John Byrd, Nancy Cartwright, Michael P. Cohen, Noel Cressie, Robert Crouchley, Gary R. Cutter, Anthony C. Davison, Bianca De Stavola, Edgar, Dobriban, Christl Donnelly, Vern Farewell, Samuel Fletcher, David Firth, David Freeborn, Andrew Gelman, David J. Hand, Sylvia Halasz, Frank E. Harrell, Maria-Eglee Perez Hernandez, Klaus Hinkelmann, Michelle Jackson, Patricia A. Jacobs, Harold Jaffe, Bimal Jain, Christiana Kartsonaki, Robert and Loretta Kass, Jesse Krijthe, Daniel Lakens, Ji-Hyun Lee, Lei Liu, Francisco Louzada, Donald Macnaughton, Giovanni M. Marchetti, Kanti V. Mardia, Peter McCullagh, Xiao-Li Meng, Jean Miller, Georges A. A Monette, Pavlos Msaouel, David Oakes, David Oliver, Yusuke Ono, John Park, Jose G. Ramirez, Nancy Reid, James L. Rosenberger, Richard J. Samworth, Stephen J. Senn, Dylan S. Small, David M. Smith, Aris Spanos, Alex Sutherland, John Tomenson, Tengyao Wang, Ronald L. Wasserstein, Gideon Weiss, Manyu Wong, Henry L. Wyneken, Henry Wynn

We exceeded our increased goal (from $5,000-$7,500) last year, raising $8,765.53, all of which is being matched! * Continue reading →

Categories: David R. Cox Foundations of Statistics Award | Leave a comment

Midnight With Birnbaum: Happy New Year 2024!

Posted on December 31, 2023 by Mayo

For three of the last four years, it was not feasible to actually revisit that spot in the road, looking to get into a strange-looking taxi, to head to “Midnight With Birnbaum”. Even last year was iffy. But this year I will, and I’m about to leave at 9pm. (The pic on the left is the only blurry image I have of the club I’m taken to.) My book Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (CUP, 2018) doesn’t include the argument from my article in Statistical Science (“On the Birnbaum Argument for the Strong Likelihood Principle”), but you can read it at that link along with commentaries by A. P. David, Michael Evans, Martin and Liu, D. A. S. Fraser, Jan Hannig, and Jan Bjornstad. David Cox, who very sadly did in January 2022, is the one who encouraged me to write and publish it. (The first David R. Cox Foundations of Statistics Prize will be awarded at the JSM 2023.) Not only does the (Strong) Likelihood Principle (LP or SLP) remain at the heart of many of the criticisms of Neyman-Pearson (N-P) statistics and of error statistics in general, but a decade after my 2014 paper, it is more central than ever–even if it is often unrecognized. Continue reading →

Categories: Birnbaum | Leave a comment

A weekend to binge read the (Strong) Likelihood Principle

Posted on December 30, 2023 by Mayo

If you read my 2023 paper on Cox’s philosophy of statistics, you’ll have come across Cox’s famous “weighing machine” example, which is thought to have caused “a subtle earthquake” in foundations of statistics. If you’re curious as to why that is, you’ll be interested to know that each year, on New Year’s Eve, I return to the conundrum. This post gives some background, and collects the essential links. Continue reading →

Categories: Likelihood Principle | Leave a comment

Princeton talk: Statistical Inference as Severe Testing: Beyond Performance and Probabilism

Posted on December 27, 2023 by Mayo

On November 14, I gave a talk at the Seminar in Advanced Research Methods for the Department of Psychology, Princeton University.

“Statistical Inference as Severe Testing: Beyond Probabilism and Performance”

The video of my talk is below along with the slides. It reminds me to return to a paper, half-written, replying to a paper on “A Bayesian Perspective on Severity” (van Dongen, Sprenger, Wagenmakers (2022). These authors claim that Bayesians can satisfy severity “regardless of whether the test has been conducted in a severe or less severe fashion”, but what they mean is that data can be much more probable on hypothesis H₁ than on H₀ –the Bayes factor can be high. However, “severity” can be satisfied in their comparative (subjective) Bayesian sense even for claims that are poorly probed in the error statistical sense (slides 55-6). Share your comments. Continue reading →

Categories: Severity, Severity vs Posterior Probabilities | Leave a comment

1 Year Ago Today: “The Statistics Wars and Their Casualties” workshop

Posted on December 8, 2023 by Mayo

It’s been 1 year (December 8, 2022) since our workshop, The Statistics Wars and Their Casualties! There were four sessions, held over 4 days. Below are the videos and slides from all four sessions of the Workshop. The first two sessions were held on September 22 & 23, 2022. Session 1 speakers were: Deborah Mayo (Virginia Tech), Richard Morey (Cardiff University), Stephen Senn (Edinburgh, Scotland). Session 2 speakers were: Daniël Lakens (Eindhoven University of Technology), Christian Hennig (University of Bologna), Yoav Benjamini (Tel Aviv University). The last two sessions were held on December 1 and 8. Session 3 speakers were: Daniele Fanelli (London School of Economics and Political Science), Stephan Guttinger (University of Exeter), and David Hand (Imperial College London). Session 4 speakers were: Jon Williamson (University of Kent), Margherita Harris (London School of Economics and Political Science), Aris Spanos (Virginia Tech), and Uri Simonsohn (Esade Ramon Llull University).

Abstracts can be found here and the schedule here. Some participant related publications are on this page. Continue reading →

Categories: Philosophy of Statistics, statistical significance, The Statistics Wars and Their Casualties | Leave a comment

New Paper: “Sir David Cox’s Statistical Philosophy and its Relevance to Today’s Statistical Controversies” (JSM Proceedings)

Posted on October 31, 2023 by Mayo

After some wrestling with the Zenodo system of uploading, my paper “Sir David Cox’s Statistical Philosophy and its Relevance to Today’s Statistical Controversies” is now published (open access) in the JSM 2023 Proceedings (link).

Continue reading →

Categories: JSM 2023 proceedings | 3 Comments

Oct 26 Update: If you want to add your name to the list of Friends of Sir David Cox: Matching funds are extended to Dec 1, 2023

Posted on October 24, 2023 by Mayo

I’m not comfortable in the role of fundraiser, but I am comfortable in the role of promoting the importance of statistical foundations, and that’s how I see the David R. Cox Foundations of Statistics Award. Thus, I’m sharing the news the ASA sent out yesterday that we’re only around $300 away from our goal, and so the matching period has been extended until Dec 1. (We have $4,698.75 towards to $5,000). Thus, I’m sharing the new news, updating what the ASA sent out yesterday: The fund is just at $6,000*, but all donations to the award until November 30, midnight (ESA) will still be matched unless $7,500. is reached before that. There is a “thermometer” on the donation page so that donors will know when we have reached that goal, however, on its first day on the job, the thermometer is malfunctioning slightly, failing to include around $500. It should be fixed tomorrow (this is outside my control). For gifts of $50 and above, you will be included in the following list of those recognized as “Friends of David R. Cox”: Continue reading →

Categories: David R. Cox Foundations of Statistics Award | Leave a comment

Philosophy of Scientific Experiment: 40+ years on

Posted on September 26, 2023 by Mayo

Some time, around the 1980s, philosophers of science turned their attention to scientific experiments in a way that contrasted with the reigning approaches to philosophy of science. My colleague, Wendy Parker, and I decided to embark on an experiment of our own, aimed at elucidating some central themes of this evolving movement, sometimes referred to as the ‘new experimentalism.’ It was to begin tomorrow, but due to unexpected weather conditions, I’ll be traveling back then, and find myself with an additional afternoon in New York City. So I’ll take this opportunity to begin my reflections, with the expectation of later incorporating Wendy’s insights, and refining my own. The philosophical concepts and ideas stemming from the philosophy of experiment provide powerful tools for making progress on fundamental problems of how we find things out in the face of limitations of data, models, and methods. The time is ripe for a comprehensive examination of this field, but our “experiment on experiment” here will be the bare beginnings of themes that come to mind. Please suggest corrections and additions in the comments. Continue reading →

Categories: experiment, new experimentalism | 7 Comments

JSM Slides on David R. Cox’s Foundations of Statistics

Posted on September 15, 2023 by Jean

I’m posting two sets of slides on David R. Cox’s Foundations of Statistics given at the JSM 2023. First, Nancy Reid, inaugural recipient of the David R. Cox Foundations of Statistics Award, gave a talk: “The Importance of Foundations,” on August 9, and her slides are below. Second are the slides to my contributed talk, “David Cox’s Statistical Philosophy,” given on August 8. For the next two weeks, (up until September 30, 2023), donations to the Award are being matched (until $5,000 is reached). You can become a “friend of David Cox” by donating $50. We’re only a little over half way there as of now. Please see information about the award below our slides. Continue reading →

Categories: David R. Cox Foundations of Statistics Award | Leave a comment

“Statistical Inference as Severe Testing: Beyond Probabilism and Performance”

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018. All Rights Reserved.