D. Lakens (Guest Post): Averting journal editors from making fools of themselves


Daniël Lakens

Associate Professor
Human Technology Interaction
Eindhoven University of Technology

Averting journal editors from making fools of themselves

In a recent editorial, Mayo (2021) warns journal editors to avoid calls for authors guidelines to reflect a particular statistical philosophy, and not to go beyond merely enforcing the proper use of significance tests. That such a warning is needed at all should embarrass anyone working in statistics. And yet, a mere three weeks after Mayo’s editorial was published, the need for such warnings was reinforced when a co-editorial by journal editors from the International Society of Physiotherapy (Elkins et al., 2021) titled “Statistical inference through estimation: recommendations from the International Society of Physiotherapy Journal Editors” stated: “[This editorial] also advises researchers that some physiotherapy journals that are members of the International Society of Physiotherapy Journal Editors (ISPJE) will be expecting manuscripts to use estimation methods instead of null hypothesis statistical tests.”

This co-editorial by journal editors in the field of physiotherapy shows the incompetence that typically underlies bans of p-values – because let’s be honest, it is always the p-value and associated significance tests that are banned, even when empirical research has shown confidence intervals or Bayes factors are misused and misinterpreted as much, or more (Fricker et al., 2019; Hoekstra et al., 2014; Wong et al., 2021). In the co-editorial, the no-doubt well-intentioned physiotherapy journal editors recommend “Estimation as an alternative approach for statistical inference”. At first glance, one might think this means the editors are recommending estimation as an alternative approach to statistical tests. In other words, we would expect to see questions that are answered by effect size estimates, and not by dichotomous claims about the presence or absence of effects. But then the editors write the following (page 3):

“The estimate and its confidence interval should be compared against the ‘smallest worthwhile effect’ of the intervention on that outcome in that population. The smallest worthwhile effect is the smallest benefit from an intervention that patients feel outweighs its costs, risk and other inconveniences. If the estimate and the ends of its confidence interval are all more favourable than the smallest worthwhile effect, then the treatment effect can be interpreted as typically considered worthwhile by patients in that clinical population.

This is confused advice, at best. The description of the statistical inference the editors want researchers to make is a dichotomous claim. It is made based on whether a confidence interval excludes the smallest effect size of interest. This procedure is mathematically identical to using p < alpha. The question whether a treatment effect is worthwhile or not is logically answered by a dichotomous ‘yes’ or ‘no’. An estimate of the effect size does not tell one whether the effect should be regarded as random noise around a true effect size of zero, or a non-zero effect.

The editors should clearly have followed Mayo’s (2021) advice to not go beyond enforcing proper use of significance tests. Estimation and significance testing answer two different questions. Estimation can’t, as the physiotherapists hope, replace significance tests. The conflict between the two approaches becomes apparent when we asks ourselves how researchers who want to publish in these physiotherapy journals should deal with situations where they would lower the alpha level to correct for multiple comparisons or sequential analyses. Are authors required to report a 99% confidence interval in cases where they would have used a Bonferroni correction when examining 5 independent test results, because they would otherwise have divided the 5% alpha by five? Or should they ignore error rates, and make claims based on a 95% confidence interval, even when this would lead to many more articles claiming treatments are beneficial than we currently find acceptable? Related applied questions that researchers who want to publish in physiotherapy journals face are which confidence interval they should report to begin with (as a 95% confidence interval is based on the idea that a maximum of a 5% error rate is deemed acceptable when making dichotomous claims, but a desired accuracy requires a different justification), as well as questions about sample size justifications (will editors accept papers with any sample size, or do they still expect an a-priori power analysis based on low Type 1 and Type 2 error rates when making claims about effect sizes?).

As Mayo (2021) writes, “The key function of statistical tests is to constrain the human tendency to selectively favor views they believe in.” Fricker and colleagues (2019) show how removing the p-values and significance testing in the journal of Behavioral and Applied Social Psychology have led to the publication of articles in which claims are made that have a much higher probability of being wrong than was the case before p-values were banned, but without transparently communicating this high error rate. Anyone who reads physiotherapy journals that follow the guidelines of journal editors to use ‘estimation’ need to be prepared for the same development in their journals. As Mayo (2021) notes in her editorial, banning proper uses of thresholds in significance tests makes it “harder to hold data dredgers culpable for reporting a nominally small p value obtained through data dredging”.

The statistical philosophy of estimation is not designed to answer questions about the presence or absence of a beneficial effect. That a large group of journal editors thinks it can shows how rational thought often takes a backseat when journal editors start to make recommendations about how to improve statistical inferences.

What can journal editors require to avert incoherent recommendations that force researchers to use approaches that do not answer the questions they are asking? The answer is simple: They should require a coherent approach to statistical inferences, anchored in an epistemology, that answers the question a researcher is interested in. The task of journals is to evaluate the quality of the work that is submitted, not to dictate the questions researchers ask. Of course, a journal can state that they believe that only work in which no scientific claims are made, or where claims are made without any control on the rate at which these claims are wrong, is the definition of ‘high quality’ – I would look forward to the arguments for such a viewpoint, and doubt they would be convincing. Let’s hope Mayo’s (2021) editorial prevents similar groups of journal editors from making fools of themselves in the future.

See Brian Haig’s commentary next.


  • Elkins, M. R., Pinto, R. Z., Verhagen, A., Grygorowicz, M., Söderlund, A., Guemann, M., Gómez-Conesa, A., Blanton, S., Brismée, J.-M., Ardern, C., Agarwal, S., Jette, A., Karstens, S., Harms, M., Verheyden, G., & Sheikh, U. (2021). Statistical inference through estimation: Recommendations from the International Society of Physiotherapy Journal Editors. Journal of Physiotherapy. https://doi.org/10.1016/j.jphys.2021.12.001
  • Fricker, R. D., Burke, K., Han, X., & Woodall, W. H. (2019). Assessing the Statistical Analyses Used in Basic and Applied Social Psychology After Their p-Value Ban. The American Statistician, 73(sup1), 374–384. https://doi.org/10.1080/00031305.2018.1537892
  • Hoekstra, R., Morey, R. D., Rouder, J. N., & Wagenmakers, E.-J. (2014). Robust misinterpretation of confidence intervals. Psychonomic Bulletin & Review, 21(5), 1157–1164. https://doi.org/10.3758/s13423-013-0572-3
  • Mayo, D. (2021). The Statistics Wars and Intellectual Conflicts of Interest.
  • Wong, T. K., Kiers, H., & Tendeiro, J. (2021). On the Potential Mismatch between the Function of the Bayes Factor and Researchers’ Expectations.

All commentaries on Mayo (2021) editorial until Jan 31, 2022 (more to come*)

Ionides and Ritov

*Let me know if you wish to write one

Categories: D. Lakens, significance tests

Post navigation

4 thoughts on “D. Lakens (Guest Post): Averting journal editors from making fools of themselves

  1. Daniel: Thank you so much for your commentary on my editorial. I never heard of the Journal for Physiotherapy. The Executive Director of the ASA writes to every journal to alter their guidelines asking them to take account of both the 2016 ASA Statement on
    Statistical Significance—which is an ASA policy statement—and the Executive Director’s editorial in 2019 which is not. I don’t know how many journals have gone along, but it’s obviously an incentive to be on the side of the ASA director of so large an association.
    I checked out the Elkins editorial and was rather surprised (not that I haven’t seen the same false claims repeated verbatim). You are right, sadly, they are in danger of making fools of themselves if even the basic definition of p-values in their article is not corrected immediately.
    I offered to help. It may be too late. However, several of their other claims are endorsed by the movement against statistical significance, so you could say they are only dutifully reporting some of the howlers repeated endlessly by them–e.g., p-values cannot provide evidence, evidence must be a posterior, must be comparative (which, notice, a posterior is not), and much else besides.

    I thought the field of physical therapy appeals to clinical trials, RCTs and preregistration.

    I should emphasize that my editorial, and the presentation I will give on January 11 does not have in mind editors who go this far astray, but you’re commentary adds fuel to the sad conclusion that, in some fields, there’s a danger that the blind are leading others into becoming blinded. The pro-confidence interval CI estimation people ought to feel morally obliged to at least correct the premises others use to come out in favor of CI estimation.

  2. rkenett

    Daniel – thank you for the great post which emphasizes what happens when the discussion is restricted to a corner of the room. Indeed, as emphasized by Mayo, editors should not enforce a specific “statistical philosophy”. Their concern should focus on the information quality of the submission and the generalizability of the claims within. The problem is to find reviewers that can properly assess this. We made a proposal for how to help reviewers do that in https://content.iospress.com/articles/statistical-journal-of-the-iaos/sji967. The prevalent selective inference phenomena also applies here with statisticians selectively focusing on what has traditionally been their comfort zone.

  3. Pingback: B. Haig on questionable editorial directives from Psychological Science (Guest Post) | Error Statistics Philosophy

  4. Pingback: Paul Daniell & Yu-li Ko commentaries on Mayo’s ConBio Editorial | Error Statistics Philosophy

Blog at WordPress.com.