The First 2023 Act of Stat Activist Watch: Statistics ‘for the people’

One of the central roles I proposed for “stat activists” (after our recent workshop, The Statistics Wars and Their Casualties) is to critically scrutinize mistaken claims about leading statistical methods–especially when such claims are put forward as permissible viewpoints to help “the people” assess methods in an unbiased manner. The first act of 2023 under this umbrella concerns an article put forward as “statistics for the people” in a journal of radiation oncology. We are talking here about recommendations for analyzing data for treating cancer!  Put forward as a fair-minded, or at least an informative, comparison of Bayesian vs frequentist methods, I find it to be little more than an advertisement for subjective Bayesian methods in favor of a caricature of frequentist error statistical methods. The journal’s “statistics for the people” section would benefit from a full-blown article on frequentist error statistical methods–not just the letter of ours they recently published–but I’m grateful to Chowdhry and other colleagues who joined me in this effort. You will find our letter below, followed by the authors’ response. You can also find a link to their original “statistics for the people” article in the references. Let me admit right off that my criticisms are a bit stronger than my co-authors.

Two quick additional things that I would wish to tell the authors in relation to their paper and response are:

  1. The application of Bayes rule in their example of diagnostic screening to compute the probability of Covid given a positive test, is just an application of conditional probability to events. It is fully carried out by frequentist means. There’s nothing really “Bayesian” about (frequentist!) diagnostic screening, yet it is a main example relied on to argue against frequentist probability.
  2. There’s no such thing as an uninformative prior–this was given up on over a decade ago.

I would never have come across an article in radiation oncology, if it were not for exchanges between members of a session I was in on “why we disagree” in statistical analysis in that field. I hereby invite all readers and the nearly 1000 registrants from our workshop to alert us throughout the year of interesting items under any of the stat activist banner.

Our letter: Bayesian Versus Frequentist Statistics: In Regard to Fornacon-Wood et al. (PDF of letter)

To the Editor:

We appreciate the authors bringing attention to controversies surrounding the use of Bayesian and frequentist statistics.1 [PDF of paper] There are many benefits to frequentist statistics and disadvantages of Bayesian statistics which were not discussed in the referenced article. We write this accompanying letter to aim for a more balanced presentation of Bayesian and frequentist statistics.

With frequentist statistical significance tests, we can learn whether the data indicate there is a genuine effect or difference in a statistical analysis, as they have the ability to control type I and type II error probabilities.2  Posteriors and Bayes factors do not ensure that the method rarely reports one treatment is better or worse than the other erroneously. A well-known threat to reliable results stems from the ease of using high powered methods to data-dredge and try to hunt for impressive-looking results that fail to replicate with new data. However, the Bayesian assessment is not altered by things like stopping rules-at least not without violating inference by Bayes theorem.3  The frequentist account,4  by contrast, is required to take account of such selection effects in reporting error probabilities. Another caution for those unfamiliar with practical Bayesian research is that estimation of a prior distribution is nontrivial. The priors they discuss are subjective degrees of belief, but there is considerable disagreement about which beliefs are warranted, even among experts. Furthermore, should conclusions differ if the prior is chosen by a radiation oncologist or a surgeon?5  These considerations are some of the reasons why most phase 3 studies in oncology rely on frequentist designs. The article equates frequentist methods with simple null hypothesis testing without alternatives, thereby overlooking hypothesis testing methods that control both type I and II errors. The frequentist takes account of type II errors and the corresponding notion of power. If a test has high power to detect a meaningful effect size, then failing to detect a statistically significant difference is evidence against a meaningful effect. Therefore, a  value that is not small is informative.

The authors write that frequentist methods do not use background information, but this is to ignore the field of experimental design and all of the work that goes into specifying the test (eg, sample size, statistical power) and critically evaluating the connection between statistical and substantive results. An effect that corresponds to a clinically meaningful effect, or effect sizes well warranted from previous studies, would clearly influence the design.

Although their article engenders important discussion, these differences between frequentist and Bayesian methods may help readers understand why so many researchers around the world still prefer the frequentist approach.

  • Amit K. Chowdhry, MD, PhD
    Department of Radiation Oncology
    University of Rochester Medical Center
    Rochester, New York
  • Deborah Mayo,
    Department of Philosophy
    Virginia Tech
    Blacksburg, Virginia
  • Stephanie L. Pugh, PhD
    NRG Oncology Statistical and Data Management Center
    American College of Radiology
    Philadelphia, Pennsylvania
  • John Park, MD
    Department of Radiation Oncology
    Kansas City VA Medical Center
    Kansas City, Missouri
  • Clifton David Fuller, MD,
    Department of Radiation Oncology
    MD Anderson Cancer Center
    Houston, Texas
  • John Kang, MD, PhD
    Department of Radiation Oncology
    University of Washington
    Seattle, Washington

References

  1. Fornacon-Wood I, Mistry H, Johnson-Hart C, et al. Understanding the differences between Bayesian and frequentist statistics. Int J Radiat Oncol Biol Phys 2022;112:1076-1082 .(PDF)
  2. Mayo DG. Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars. Cambridge, UK: Cambridge University Press; 2018.
  3. Ryan EG, Brock K, Gates S, Slade D. Do we need to adjust for interim analyses in a Bayesian adaptive trial design? BMC Med Res Methodol 2020;20:150.
  4. Jennison C, Turnbull BW. Group Sequential Methods With Applications to Clinical Trials. Boca Raton, FL: CRC Press; 1999.
  5. Staley K, Park J. Comment on Mayo’s “The statistics wars and intellectual conflicts of interest”. Conserv Biol 2022;36:e13861.

 

Fornacon-Wood Reply: In Reply to Chowdhry et al. (PDF of letter)

To the Editor:

We thank the authors for their response  to our “statistics for the people” article that aimed to introduce perhaps unfamiliar readers to Bayesian statistics and some potential advantages of their use. We agree that frequentist statistics are a useful and widespread statistical analytical approach, and we are not aiming to revisit the frequentist versus Bayesian arguments that have been well articulated in the literature.  However, there are a couple of points we would like to make.

First, we acknowledge that the majority of phase 3 studies use frequentist designs, and this has the advantage of facilitating meta-analyses using established techniques. However, we would argue that the reason such frequentist designs are so prevalent is likely to have as much to do with convention (from funders/regulators as well as from researchers themselves), the relative exposure of the 2 approaches in educational materials, and the historic difficulties in calculating Bayesian posteriors as it does with the arguments the authors make.

Second, although we agree with Chowdhry et al that there are many challenges associated with the estimation of prior probability distributions, we note that similar arguments apply to effect size estimation, which they cite as a strength of the Neyman-Pearson/null hypothesis significance testing approach (ie, the use of power calculations to limit the risk of type II errors).  We would also re-enforce the point we make in the article about the importance of testing the influence of the prior (represented as the divergent beliefs of the hypothetical radiation oncologist and surgeon in the communication by Chowdhry et al) in the analysis results. If the data are strong enough, the posterior distributions will be in close enough agreement to convince both parties. As we noted, it is also possible to undertake Bayesian analyses without prior information, using an uninformative prior, in which case the analysis is driven directly by the data, as for a frequentist calculation. As an aside, there is continued debate about the relative merits and deficiencies of the different frequentist approaches to significance testing, particularly around the widespread use of the hybrid Neyman-Pearson/null hypothesis significance testing approach.

********************************************************

Please share your constructive remarks in the comments to this post.

 

 

 

Categories: stat activist watch 2023, statistical significance tests | 2 Comments

Post navigation

2 thoughts on “The First 2023 Act of Stat Activist Watch: Statistics ‘for the people’

  1. Referring to the response of your Bayesian opponents: problem is, there is no such thing as an “uninformative prior”. Different people have proposed different definitions and they lead to different priors. The more unknown parameters there are (both interest parameters and nuisance parameters) the more differences arise.

    So-called “uninformative priors” are actually very informative, they are usually chosen on formal convenience grounds, because they take away the necessity of actually thinking about what prior information there is.

    The Bayesian solution tells us what one person should think given their evaluation of their prior knowledge and their data, but it does not tell us how to communicate what the data says to others; others who might well have other prior information.

I welcome constructive comments that are of relevance to the post and the discussion, and discourage detours into irrelevant topics, however interesting, or unconstructive declarations that "you (or they) are just all wrong". If you want to correct or remove a comment, send me an e-mail. If readers have already replied to the comment, you may be asked to replace it to retain comprehension.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.