Too little, too late? The “Don’t say significance…” editorial gets a disclaimer (ii)

Posted on June 15, 2022 by Mayo

Someone sent me an email the other day telling me that a disclaimer had been added to the editorial written by the ASA Executive Director and 2 co-authors (Wasserstein et al., 2019) (“Moving to a world beyond ‘p < 0.05′”). It reads:

The editorial was written by the three editors acting as individuals and reflects their scientific views not an an endorsed position of the American Statistical Association.

The person who informed me, who does not wish to be named, also told me there had been a symposium on June 3 to discuss these issues: Scientific Reproducibility and Statistical Significance Symposium . Are the two events related? Perhaps the disclaimer was announced at that forum? I really don’t know. The description of the Symposium reads:

About this event

All researchers agree that the scientific method of experimentation and replication must be preserved. To date, null hypothesis significance testing is the best way to assure that random error is accounted for in the analysis of scientific experiments and surveys. Although the statistics community has been talking the misuse of significance testing, and some inroads have been made into statistics education, the conversation among statisticians has not reached the scientific community on a large scale.

The aim of the symposium on scientific inquiry and statistical significance is four-fold:

1. To disseminate the stance of the American Statistical Association on the appropriate use of results from null hypothesis significance testing for analysis of studies from social science, science, engineering, and humanities.

[June 20 update: The Symposium organizer shifts on this first goal in her description.
See my comment]

2. To offer alternatives to such testing

3. To discuss changes to publication policies that would benefit both individual scientists and science writ large.

4. To discuss methods of educating current and future scientists in appropriate methods for gathering and analyzing data.

The first question that comes to my mind is this: if “all researchers agree that …null hypothesis significance testing is the best way to assure that random error is accounted for in the analysis of scientific experiments and surveys” then why is one of the 4 aims of the symposium: to offer alternatives to such [statistical significance] testing”? It seems peculiar to say that all researchers agree a method is best for accomplishing a central, if limited, task for which scientists look to statistics, so let’s find alternatives to replace it. [1, 1a, 1b]

And what about the first goal “To disseminate the stance of the American Statistical Association on the appropriate use of results from null hypothesis significance testing…”? An ASA Task Force just recently put forward a statement on statistical significance and replicability (Benjamini et al., 2021), so it would make sense for the dissemination of the ASA stance to be an affirmation of the. 2021 Task Force Statement on Statistical Significance and Replicability. But that would be a policy shift by the ASA, if I understand it correctly, and the symposium program shows no sign that is what is meant. Aside from Kafadar, ex officio, no members of this Task Force are on the program. (Don’t confuse the 2021 ASA Task Force statement with the 2016 ASA Statement on p-values–which is an ASA policy statement). This is all very confusing. For a bit of the background, I’ve pasted a few relevant links from this blog below (searching this blog will find others).

I look forward to listening to the recording of the meeting when it is available.

So is the disclaimer 3+ years after Wasserstein et al., 2019 too little too late? Granted, had such a disclaimer been added to the editorial in December 2019, there would not have been a need for Karen Kafadar (then ASA President) to appoint the task force on statistical significance and replicability in 2019. [2]

Concerned that WSL 2019 might be taken as a continuation of the 2016 ASA Statement, in 2019 the Board of the ASA appointed a President’s Task Force on Statistical Significance and Replicability. It was put in the odd position of needing to “address concerns that [the Executive Director’s editorial, WSL 2019] might be mistakenly interpreted as official ASA policy” (Benjamini et al., 2021). From Mayo and Hand (2022):

But I don’t think it’s possible to neutralize by disclaimer the effect of Wasserstein et al., 2019. [See note 3] Besides, in my view, what is really needed are revisions to some of the claims made within it (and I’m referring just to the first couple of pages, not the discussion of the articles by various others in introducing the special issue). If such revisions were made, people could appreciate some of the many useful suggestions in Wasserstein et al., 2019. Early on, I proposed several simple revisions in private communication with Wasserstein, hoping to improve the document. Some are in the following blogpost:

June 17, 2019: “The 2019 ASA Guide to P-values and Statistical Significance: Don’t Say What You Don’t Mean” (Some Recommendations)(ii)

I doubt the authors (Wasserstein, Schirm and Lazar) really and truly mean many of the claims alleged to warrant “abandoning” statistical significance (in fact, some are at odds with the 2016 ASA statement on p-values)–which, to repeat, is ASA policy, but is quire controversial nevertheless. Or, if they do, much more in the way of warranting them is needed (see Mayo and Hand 2022).

Please share your constructive thoughts on this in the comments.

Notes:

[1] Anyone who reads this blog knows I don’t like the “null hypothesis significance test” label, as it wrongly assumes a very artificial context with a point null with no alternatives. Worse, the label NHST is often used to describe a flawed methodology that commits all the central fallacies: construes a p-value as a posterior, takes statistical as substantive, supposes a small p-value means a large effect size, etc.

[1a] Added June 16, 2022. To be clear, I think that both p-values, and the standard N-P (accept/reject) testing methodology call for reformulation and reinterpretation. I have proposed such reformulations based on a severe testing philosophy. Here, error probabilities are used to qualify how well (and poorly) tested claims are. Fallacies of statistical significance and insignificance are avoided. However, the “alternatives” that are put forward to replace statistical significance assume notions of evidence and inference that are often at odds with the error statistical goals of significance tests.

[1b] Added June 20. Aside from the organizer changing the description of goal #1 (in the ASA Connect discussion) we also learn that the 2016 ASA Statement on P-values was not a statement on null hypothesis significance testing. Hmm. See my comment.

[2] Kafadar provided the ASA Board with over 40 examples showing the mistaken reference to Wasserstein et al., 2019 as ASA policy. She was part of a JSM panel I organized on the general topic in 2020. My slides are in this post. Kafadar’s slides are here. Stan Young, Yaacov Ritov, and Larry Wasserman were also part of the session.

[3] As ASA Executive Director, Wasserstein is an official ASA spokesperson. So announcing his view (on a matter hotly disputed by ASA members) has a strong impact on others even if he demurs: “I’m just wearing my hat as an individual”. The only solution, I argue, is for officials not to take sides on this type of issue–something I qualify in my 2021/22 Editorial in Conservation Biology.

Some blogposts of relevance for background are:

March 25, 2019: “Diary for Statistical War Correspondents on the Latest Ban on Speech.”

July 19, 2019: “The NEJM Issues New Guidelines on Statistical Reporting: Is the ASA P-Value Project Backfiring? (i)”

September 19, 2019: “(Excerpts from) ‘P-Value Thresholds: Forfeit at Your Peril’ (free access).” The article by Hardwicke and Ioannidis (2019), and the editorials by Gelman and by me are linked on this post.

November 4, 2019: “On some Self-defeating aspects of the ASA’s 2019 recommendations of statistical significance tests”

November 14, 2019: “The ASA’s P-value Project: Why it’s Doing More Harm than Good (cont from 11/4/19)”

November 30, 2019: “P-Value Statements and Their Unintended(?) Consequences: The June 2019 ASA President’s Corner (b)”

December 13, 2019: “’Les stats, c’est moi’: We take that step here! (Adopt our fav word or phil stat!)(iii)”

August 4, 2020: “August 6: JSM 2020 Panel on P-values & ‘Statistical significance’”

December 13, 2020: “The Statistics Debate (NISS) in Transcript Form”

January 9, 2021: “Why hasn’t the ASA Board revealed the recommendations of its new task force on statistical significance and replicability?”

June 20, 2021: “At Long Last! The ASA President’s Task Force Statement on Statistical Significance and Replicability”

June 28, 2021: “Statisticians Rise Up To Defend (error statistical) Hypothesis Testing”

July 30, 2021: “Invitation to discuss the ASA Task Force on Statistical Significance and Replication”

December 18, 2021: “The Statistics Wars and Intellectual Conflicts of Interest” (Link to Mayo’s Editorial of the same name in Conservation Biology)

February 24, 2022: “January 11 Forum: “Statistical Significance Test Anxiety”: Benjamini, Mayo, Hand”.

May 15, 2022: “3 commentaries on my editorial are being published in Conservation Biology“

[This post includes links to all of the 12 commentaries on my editorial from Jan 5.]

Categories: ASA Guide to P-values, ASA Task Force on Significance and Replicability, editorial COIs, WSL 2019 | 20 Comments

20 thoughts on “Too little, too late? The “Don’t say significance…” editorial gets a disclaimer (ii)”

June 17, 2022

Mayo

It’s interesting to note the roles significance tests are playing in understanding and developing theories about long Covid.
https://www.medicalnewstoday.com/articles/long-covid-risk-latest-data-on-three-variants#Slight-differences-in-variants

Reply
June 20, 2022

Nathan A Schachtman

Mayo,

Well, first, the corrigendum to the Wasserstein 2019 editorial is a welcomed event, although as you note, late in the day. Having been snuckered by its status, and having watched others describe it as an ASA position, I wonder how it has taken the ASA this long to correct the misimpression.

And the symposium you linked to, interestingly, got very little promotion from the ASA. I did not get any emails from the ASA about the event, and I am on its mailing list. Did anyone else?

The contradictions among the stated goals of the conference seem blatant. Do they disclose some cognitive dissonance at the ASA, or among some of the ASA’s leadership. Or are we witnessing the advocacy campaign, previously seen in emails to journal editors along with an ASA’s officer’s call for abandoning significance testing, simply taking another form?

Nathan Schachtman

Reply

June 20, 2022

Mayo

Nathan: I’ll have to remember to use “corrigendum”. I hadn’t heard of the symposium either, neither had a number of other people who are ASA members. My anonymous informer says it was organized by Monnie McGee, and announced in ASA Connect, the ASA members’ email list. I know that I sometimes miss generic announcements because we get dozens of them, but several people who I expect would have been informed, including some members of the ASA Task Force on Statistical Significance were unaware of it. If you find the link, please share it, I haven’t had time to search.

Reply

June 20, 2022

Mayo

I did find it. See the next comment for some puzzling revelations:

Reply

June 20, 2022

Mayo

After Schachtman’s comment I decided not to be lazy, but to look up the “announcement” of the symposium in ASA Connect, which is a forum to share comments among ASA members:
What I found in the comments about the goal of the Symposium is puzzling.

First, here’s the linK
https://community.amstat.org/communities/community-home/digestviewer/viewthread?GroupId=2653&MessageKey=94487f85-ff33-4d1d-b0ad-c6d61ecf2f41&CommunityKey=6b2d607a-e31f-4f19-8357-020a8631b999

The first registration link given turns out to be wrong:

A commentator noted the mistake, and in a reply, the correct link was given. However, the earlier incorrect link was not fixed. So you’d have to follow the back and forth discussion to find it.

In another comment, David Hoaglin asked:
“Does the ASA actually have an official stance on the appropriate use of results from null-hypothesis significance testing?

Will the Symposium discuss the report of the ASA President’s Task Force on Statistical Significance and Replicability (Amstat News, August 2021, pages 5 and 6)?”
————————
Monnie McGee’s reply is interesting:

“Thank you for your question. The ASA does not have an official position on NHST. The 2016 ASA statement on p-values and statistical significance is the ASA’s only official position on such matters, and it does not address NHST. Allow me to restate the objectives of the symposium to correct my mistake:

The aim of the symposium on scientific inquiry and statistical significance is four-fold
1. To discuss the appropriate use of results from NHST.
2. To offer alternatives to such testing
3. To discuss changes to publication policies that would benefit both individual scientists and science writ large.
4. To discuss methods of educating current and future scientists in appropriate methods for gathering and analyzing data.”

——————————
So the 2016 ASA Statement on P-values is not a position on NHST, although it does concern “such matters”. The Symposium, on the other hand, has as its first goal:
“To disseminate the stance of the American Statistical Association on the appropriate use of results from null hypothesis significance testing for analysis of studies from social science, science, engineering, and humanities”.
No, wait, in her reply the first goal is changed to:
“To discuss the appropriate use of results from NHST”.

But the official conference site still has the first goal as dissemination of the ASA stance on the appropriate use of results from NHST.

And according to McGee, the 2016 ASA statement on p-values is not on NHST. And this was supposed to clear things up?
They should define these terms.

Reply
June 20, 2022

Nathan A Schachtman

This is all so murky. I just went back to read the description of the event at

https://www.eventbrite.com/e/scientific-reproducibility-and-statistical-significance-symposium-tickets-327521494607

So, we are to believe that the aim of the symposium on scientific inquiry and statistical significance is four-fold:

For goal 1. What is the ASA’s stance to be disseminated? Is it the ASA 2016 p-value. Then it is done. Mark it disseminated and let’s move on.

For goal 2. Alternatives. Well, it is not like the ASA is itself dishing up alternatives. The 2016 statement notes Bayes factors, etc. Is the point to tell attendees that they do not need to use tests for random error, with or without thresholds?

For goal 3. Ah, now we are getting to the heart of the matter. Isn’t this just about furthering covert advocacy – such as the email campaign to lobby journal editors (including NEJM and JAMA), and the dressing up of an editorial to appear as if it were an ASA position?

For goal 4. “Appropriate” methods for gathering and analyzing data. Hmm. They give PhDs in this, no? This would seem to be an “appropriate” topic for an ASA seminar, but the description of goal 4 is pretty vague.

The juxtaposition of these 4 certainly makes me suspect more covert advocacy. The changing explanation is a bit concerning as well. I am not sure that the 2016 statement has nothing to say about NHST. After all, principle 3 of 6 states that:

“3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.”

As stated, the principle seems unexceptional for the evaluation of health effects studies for causal inference. The statement implies that passing a threshold may well be necessary, only not sufficient. Researches would of course want more than random error to be ruled out; they would want to assure internal and external validity of the studies relied upon.

As a parent, and as a cross-examiner, I have found the best way to tell when someone is dissembling is that the explanation and the details keep changing.

Reply

June 20, 2022

Mayo

Nathan:
Yes it appears to be more covert advocacy. The breakout choice looks to be Trafimow or Bayes. The Altman et al paper https://www.tandfonline.com/doi/abs/10.1080/19466315.2022.2038258 (couldn’t get all of it) and the others rehearse the old criticisms. It’s sad that granting Kafadar’s call for a conference leads to more displays of intellectual conflicts of interest, with no sign of the ASA President’s Task Force whose report they dismissed! At least they should have given them their say. I hope readers will search for the ASA Task Force.

Reply

June 20, 2022

john byrd

Woven throughout these goals is the idea that scientists need to put more control of their work in the hands of the statistician. What I see– and someone mentioned cognitive dissonance– does not lead me to believe that the professionals putting this program together are people to place my trust in. I always come back to the same concern when I see these calls for reform. They toss out the existence of “alternative approaches” as though they are trouble-free methods such as Bayes Factors that will free us from methodological concerns. There is no critical consideration of the many problems with Bayesian and Likelihood approaches that will have to be wrangled with every bit as much as p-value concerns. (If people will cherry-pick for p-values, what makes you think they can form a subjective prior that is objectively sound?) The way these debates have been carried out within the ASA community does not invoke a lot of confidence. The situation would be vastly different had these events always included respected defenders of significance testing in the mix.

Reply

June 20, 2022

Mayo

Hi John, great to hear from you. one has to ask just why they’re so afraid of including respected defenders of error statistical methods–even the leading researchers in the ASA Presidential Task Force on Statistical Significance. Readers: The ASA President’s task force was put together by then president Karen Kafadar because Wasserstein wouldn’t issue the disclaimer. The ASA Board voted the committee in despite the opposition by the Executive Director Wasserstein. His opposition was because it would include respected defenders of frequentist methods (at odds with what he thinks he endorses). They’re so afraid of giving a voice to those who disagree, because their arguments would collapse. Now we have a symposium which seems to serve as a kind of disclaimer of the disclaimer, claiming as it does, to focus on the ASA’s sense of null hypothesis testing which is to abandon it, kill it.
You’re absolutely right “If people will cherry-pick for p-values, what makes you think they can form a subjective prior that is objectively sound?” They can’t. Likewise, they don’t allow people who don’t share their low belief in the value of significance tests speak at your symposium that was supposed to clear the air and have Wasserstein show penitence. You get a kind of disclaimer of the disclaimer, in goal one that says you will get the ASA stance after all.

Reply

June 21, 2022

rkenett

I wrote before on these topics:
1. Position papers by professional organizations seems logical but is problematic. See question 8 in https://www.linkedin.com/pulse/ten-questions-statistics-data-science-2020-beyond-ron-s-kenett/
2. The brouhaha in the statistics community did not necessarily transfer to the scientific landscape. The challenges raised by COVID-19 where totally different https://rdcu.be/cLMKZ
3. The Bethesda event Mayo refers to was mostly an inbreeding event https://blogisbis.wordpress.com/2017/10/24/to-p-or-not-to-p-my-thoughts-on-the-asa-symposium-on-statistical-inference/. The recent follow up ASA conference kept undercover seems incestious..

An attempt at the applies stats big picture is available in a recently published article:
https://chemistry-europe.onlinelibrary.wiley.com/doi/10.1002/ansa.202000159?fbclid=IwAR2sodBOxmkoptbbux__Rw-x8aYlTpjEFAsy6R0pijahn9dQPiPHDdGNZKw

Reply

June 22, 2022

hwyneken

Ron – I agree that the ASA shouldn’t try to push statistics in a Bayesian or Frequentist direction. They go with different philosophies of science and neither is really right or wrong. IMO it should be up to the journal if they want to impose some standards or if they are open to both. I am not really into saying “whatever works” on a personal level. I’m substantially more interested in frequentism at this point in my life. However, I’d rather directly persuade my colleagues of this instead of getting official approval for my POV.

Reply

June 21, 2022

Andrew Gelman

“To date, null hypothesis significance testing is the best way to assure that random error is accounted for in the analysis of scientific experiments and surveys.”

Huh? Sounds like propaganda to me, not science. Good for them that they know “to date . . . the best way.” The rest of us have to muddle through not being so sure about what is “best.”

In all seriousness, what’s scary to me is not so much this attitude about the value of null hypothesis significance testing—I get that all sorts of methods can be useful in different settings—but rather that whoever wrote that paragraph was so inappropriately certain that this is “to date . . . the best way.” Not “one way” or even “our preferred way,” but “the best.” This is, in my opinion, a bad attitude that is not helpful for scientific inquiry.

Reply

June 21, 2022

Mayo

Andrew:
I’m really glad that you raised that point. As I say in the post:

The first question that comes to my mind is this: if “all researchers agree that …null hypothesis significance testing is the best way to assure that random error is accounted for in the analysis of scientific experiments and surveys” then why is one of the 4 aims of the symposium: to offer alternatives to such [statistical significance] testing?

Since they are offering alleged alternatives, it would follow they don’t think null hypothesis significance testing is the best way (for this goal) after all. So they appear to be saying what they don’t mean and contradicting themselves.

There are more things they know and will disseminate. The goals 3 and 4 of the symposium assume they know the publication policies that will benefit science (this refers to Trafimow’s presentation) and they know the appropriate education policies for current and future scientists, respectively.
Please see my comment beginning:
“After Schachtman’s comment I decided not to be lazy, but to look up the “announcement” of the symposium in ASA Connect, which is a forum to share comments among ASA members:
What I found in the comments about the goal of the Symposium is puzzling.

First, here’s the linK
https://community.amstat.org/communities/community-home/digestviewer/viewthread?GroupId=2653&MessageKey=94487f85-ff33-4d1d-b0ad-c6d61ecf2f41&CommunityKey=6b2d607a-e31f-4f19-8357-020a8631b999
errorstatistics.com/2022/06/15/too-little-too-late-the-dont-say-significance-editorial-gets-a-disclaimer/comment-page-1/#comment-222280″

In one comment (the Symposium was announced in a comment forum for ASA members—if anyone can point to a broader announcement I’d be glad to know), David Hoaglin asked the organizer, Monnnie McGee:

“Will the Symposium discuss the report of the ASA President’s Task Force on Statistical Significance and Replicability (Amstat News, August 2021, pages 5 and 6)?”

No answer was given.

Hoaglin also asked “Does the ASA actually have an official stance on the appropriate use of results from null-hypothesis significance testing?”—alluding to goal #1 of the forum. Monnie McGee’s reply is that goal #1 was a mistake of the objectives of the symposium, and she will now restate goal #1:

“Thank you for your question. The ASA does not have an official position on NHST. The 2016 ASA statement on p-values and statistical significance is the ASA’s only official position on such matters, and it does not address NHST. Allow me to restate the objectives of the symposium to correct my mistake” (McGee)
She corrected her “mistake” by changing the first goal of the symposium to

“1. To discuss the appropriate use of results from NHST.”

But the official symposium retains the original goal:

‘To disseminate the stance of the American Statistical Association on the appropriate use of results from null hypothesis significance’.

So even though the ASA doesn’t have an official position on NHST, it does have a “stance” and this is to be disseminated at the symposium.

Reply

June 22, 2022

Andrew Gelman

Deborah:

Good point. Perhaps a better term than “propaganda” would be “b.s.” With propaganda you’re trying to convince people or just establish as default some questionable belief. With b.s. you’re just saying something that kinda sounds good without worrying too much about the meaning. In this case, they may have intended the statement, “To date, null hypothesis significance testing is the best way to assure that random error is accounted for in the analysis of scientific experiments and surveys,” to be an unobjectionable mom-and-apple-pie kind of happy talk, without ever thinking about the literal meaning of the phrases “the best way” and “random error.”

Reply

June 22, 2022

Mayo

Andrew:
There’s plenty of propaganda & plenty of b.s. And what is it “to assure that random error is accounted for”. Is that accomplished by reporting p-values? But the 2016 ASA Statement was about p-values, yet McGee says the ASA has no official statement about null hypothesis significance testing. ASA seems to have gotten itself into a tangled web by accepting the 2016 statement on p-values as official policy, then seeming to extend that policy in the Wasserstein et al. 2019 editorial (“don’t say significance, don’t use thresholds”) which gave rise to the 2019 (Presidential) Task Force on Statistical Significance (to correct the impression of the Wasserstein et al 2019 editorial), but then their statement was blocked from becoming ASA policy (though it was published in 2021), and now there’s a disclaimer placed in Wasserstein et al 2019 with this associated Symposium which gives the ASA “stance” on null hypothesis significance testing, which looks to be the one in Wasserstein et al 2019, thereby serving as a kind of disclaimer of the disclaimer.

Reply

June 22, 2022

Andrew Gelman

Deborah:

I went to the webpage of the meeting with that weird B.S. statement, and I wasn’t quite sure who had sponsored it, beyond that it was at Southern Methodist University. I did not see any ASA connection. I wonder what person or committee wrote that horrible description, “To date, null hypothesis significance testing is the best way to assure that random error is accounted for in the analysis of scientific experiments and surveys.” Unseemly for statisticians to act with an inappropriate air of certainty.

Reply

June 24, 2022

Mayo

Andrew:
I understand the symposium was organized by Monnie McGee, who had been a student of Kathy Ensor who is ASA President. So I’m guessing Ensor or McGee wrote it. But it’s a good question as to whether there is any official connection between ASA and the symposium. My guess is that the ASA sponsored it, but in that case, it would say that somewhere.
The slides/recording available are here:

https://smu.box.com/s/56t0ujhu928x7s2uncsz10opip718kud

Reply

June 22, 2022

Mayo

This was the the statement issued by the ASA President’s Task Force Statement on Statistical Significance and Replicability. While a majority of the ASA Board voted to adopt it, more was needed for it to be deemed official. But the ASA should at least have reported the results of its own committee, but it would not. It was published eventually in the Annals of Applied Stat in 2021.

ASA President’s Task Force Statement on Statistical Significance and Replicability

the past decade, the sciences have experienced elevated concerns about replicability of study results. An important aspect of replicability is the use of statistical methods for framing conclusions. In 2019, the president of the American Statistical Association established a task force to address concerns a 2019 editorial in The American Statistician (an ASA journal) might be mistakenly interpreted as official ASA policy. (The editorial recommended eliminating the use of “p < 0.05” and “statistically significant” in statistical analysis.) This document is the statement of the task force, and the ASA invited us to publicize it. Its purpose is two-fold: to clarify that the use of p-values and significance testing, properly applied and interpreted, are important tools that should not be abandoned and to briefly set out some principles of sound statistical inference that may be useful to the scientific community.

P-values are valid statistical measures that provide convenient conventions for communicating the uncertainty inherent in quantitative results. Indeed, p-values and significance tests are among the most studied and best understood statistical procedures in the statistics literature. They are important tools that have advanced science through their proper application.

Much of the controversy surrounding statistical significance can be dispelled through a better appreciation of uncertainty, variability, multiplicity, and replicability. The following general principles underlie the appropriate use of p-values and the reporting of statistical significance and apply more broadly to good statistical practice.

Capturing the uncertainty associated with statistical summaries is critical. Different measures of uncertainty can complement one another; no single measure serves all purposes. The sources of variation the summaries address should be described in scientific articles and reports. Where possible, those sources of variation that have not been addressed should also be identified.

Dealing with replicability and uncertainty lies at the heart of statistical science. Study results are replicable if they can be verified in further studies with new data. Setting aside the possibility of fraud, important sources of replicability problems include poor study design and conduct, insufficient data, lack of attention to model choice without a full appreciation of the implications of that choice, inadequate description of the analytical and computational procedures, and selection of results to report. Selective reporting, even the highlighting of a few persuasive results among those reported, may lead to a distorted view of the evidence. In some settings, this problem may be mitigated by adjusting for multiplicity. Controlling and accounting for uncertainty begins with the design of the study and measurement process and continues through each phase of the analysis to the reporting of results. Even in well-designed, carefully executed studies, inherent uncertainty remains, and the statistical analysis should account properly for this uncertainty.

The theoretical basis of statistical science offers several general strategies for dealing with uncertainty. P-values, confidence intervals, and prediction intervals are typically associated with the frequentist approach. Bayes factors, posterior probability distributions, and credible intervals are commonly used in the Bayesian approach. These are some among many statistical methods useful for reflecting uncertainty.

Thresholds are helpful when actions are required. Comparing p-values to a significance level can be useful, though p-values themselves provide valuable information. P-values and statistical significance should be understood as assessments of observations or effects relative to sampling variation, and not necessarily as measures of practical significance. If thresholds are deemed necessary as part of decision-making, they should be explicitly defined based on study goals, considering the consequences of incorrect decisions. Conventions vary by discipline and purpose of analyses.

In summary, p-values and significance tests, when properly applied and interpreted, increase the rigor of the conclusions drawn from data. Analyzing data and summarizing results are often more complex than is sometimes popularly conveyed. Although all scientific methods have limitations, the proper application of statistical methods is essential for interpreting the results of data analyses and enhancing the replicability of scientific results

Reply

June 30, 2022

rkenett

OK – what is the implication of this to applied statistics? For an example and review checklist see https://chemistry-europe.onlinelibrary.wiley.com/doi/full/10.1002/ansa.202000159

Reply

June 23, 2022

Mayo

Here’s the link to the talks from the symposium, slides and available recordings:

https://smu.box.com/s/56t0ujhu928x7s2uncsz10opip718kud

Reply

Too little, too late? The “Don’t say significance…” editorial gets a disclaimer (ii)

Post navigation

20 thoughts on “Too little, too late? The “Don’t say significance…” editorial gets a disclaimer (ii)”

Leave a reply to Andrew Gelman Cancel reply

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.

Too little, too late? The “Don’t say significance…” editorial gets a disclaimer (ii)

Related

Post navigation

20 thoughts on “Too little, too late? The “Don’t say significance…” editorial gets a disclaimer (ii)”

Leave a reply to Andrew Gelman Cancel reply

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.