Monthly Archives: January 2023

2023 Syllabus for Philosophy of Inductive-Statistical Inference

PHIL 6014 (crn: 20919): Spring 2023 

Philosophy of Inductive-Statistical Inference
(This is an IN-PERSON class*)
Wed 4:00-6:30 pm, McBryde 22
*There may be opportunities for zooming half-way through the semester.

Syllabus: First Installment (PDF)

D. Mayo (2018) Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (SIST) CUP, 2018: SIST (electronic and paper provided to those taking the class; proofs are at errorstatistics.com, see below).
Articles from the Captain’s Bibliography (links to new articles will be provided). Other useful information can be found on the SIST Abstracts & Keywords and this post with SIST Excerpts & Mementos)

DateThemes/readings
1. 1/18      Introduction to the Course:
How to tell what’s true about statistical inference

(1/18/23 SLIDES here)

Reading: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (SIST): Preface, Excursion 1 Tour I 1.1-1.3, 9-29

MISC: Souvenir A; SIST Abstracts & Keywords for all excursions and tours
2. 1/25
Q #2
 
Error Probing Tools vs Comparative Evidence: Likelihood & Probability
What counts as cheating?
Intro to Logic: arguments validity & soundness

(1/25/23 SLIDES here)

Reading: SIST: Excursion 1 Tour II 1.4-1.5, 30-55
Session #2 Questions: (PDF)

MISC: NOTES on Excursion 1, SIST: Souvenirs B, C & D, Logic Primer (PDF)
3. 2/1
   Q #3
UPDATED
Induction and Confirmation: PhilStat & Formal Epistemology
The Traditional Problem of Induction
Is Probability a Good Measure of Confirmation? Tacking Paradox

(2/1/23 SLIDES here)

Reading: SIST: Excursion 2, Tour I: 2.1-2.2, 59-74
Hacking “The Basic Rules of Probability” Hand Out (PDF)
UPDATED: Session #3 Questions: (PDF)

MISC: Excursion 2 Tour I Blurb & notes
4. 2/8 &
5. 2/15
Assign 1 2/15 
Falsification, Science vs Pseudoscience, Induction
Statistical Crises of Replication in Psychology & other sciences

Popper, severity and novelty, array of problems and models
Fallacies of rejection, Duhem’s problem; solving induction now

Reading for 2/8: Popper, Ch 1 from Conjectures and Refutations (PDF), Popper Test
Reading for 2/15: SIST: Excursion 2, Tour II: 2.3-2.7, pages TBA
Optional for 2/15: Gelman & Loken (2014)

ASSIGNMENT 1 (due 2/15) (PDF)

MISC: SIST Souvenirs (E), (F), (G),(H
Excursion 2 Tour II Blurb & notes
 Fisher Birthday: February 17: Celebration of N-F wars
6. 2/22 & 7. 3/1Ingenious and Severe Tests: Fisher, Neyman-Pearson, Cox: Concepts of Tests
Reading SIST: Excursion 3 Tour I: 3.1-3.3, 119-163 (trade-offs 328-330)

The Triad: Fisher (1955), Pearson (1955); Neyman (1956)
The 1919 eclipse tests; Fisherian and N-P Tests;
Frequentist principle of evidence: FEV

Apps for statistical testing 
MISC: Excursion 3 Tour I Blurb & notes
SPRING BREAK Statistical Exercises While Sunning (March 4-12)

The following is very tentative, and will depend on student interests.
8. 3/15 Assign 2Confidence & Fiducial Intervals and Deeper Concepts:
Higgs Discovery
9. 3/22Objectivity in Science: Objectivity in Error Statistics & Bayesian Philosophies
10. 3/29 Short essayBayes factors and Bayes/Fisher Disagreement, Jeffreys-Lindley Paradox
11. 4/5Biasing Selection Effects, P-Hacking, Data Dredging etc.
12. 4/12
Assign 3
Negative Results: Power vs Severity
13. 4/19Should Statistical Significance Tests be Abandoned, Retired, or Replaced?
Other: TBA
14. 4/26Severity, Sensitivity, Safety: PhilStat and Classical Epistemology
15. 5/3Current Reforms and Stat Activism: Practicing Our Skills
  Final Paper
Categories: Announcement, new course | 2 Comments

I’m teaching a New Intro to PhilStat Course Starting Wednesday:

Ship StatInfasst (Statistical Inference as Severe Testing: SIST) will set sail on Wednesday January 18 when I begin a weekly seminar on the Philosophy of Inductive-statistical inference. I’m planning to write a new edition and/or companion to SIST (Mayo 2018, CUP), so it will be good to retrace the journey. I’m not requiring a statistics or philosophy background. All materials will be on this blog, and around halfway through there may be an opportunity to zoom, if there’s interest.

 

Categories: Announcement, new course | 2 Comments

The First 2023 Act of Stat Activist Watch: Statistics ‘for the people’

One of the central roles I proposed for “stat activists” (after our recent workshop, The Statistics Wars and Their Casualties) is to critically scrutinize mistaken claims about leading statistical methods–especially when such claims are put forward as permissible viewpoints to help “the people” assess methods in an unbiased manner. The first act of 2023 under this umbrella concerns an article put forward as “statistics for the people” in a journal of radiation oncology. We are talking here about recommendations for analyzing data for treating cancer!  Put forward as a fair-minded, or at least an informative, comparison of Bayesian vs frequentist methods, I find it to be little more than an advertisement for subjective Bayesian methods in favor of a caricature of frequentist error statistical methods. The journal’s “statistics for the people” section would benefit from a full-blown article on frequentist error statistical methods–not just the letter of ours they recently published–but I’m grateful to Chowdhry and other colleagues who joined me in this effort. You will find our letter below, followed by the authors’ response. You can also find a link to their original “statistics for the people” article in the references. Let me admit right off that my criticisms are a bit stronger than my co-authors.

Two quick additional things that I would wish to tell the authors in relation to their paper and response are:

  1. The application of Bayes rule in their example of diagnostic screening to compute the probability of Covid given a positive test, is just an application of conditional probability to events. It is fully carried out by frequentist means. There’s nothing really “Bayesian” about (frequentist!) diagnostic screening, yet it is a main example relied on to argue against frequentist probability.
  2. There’s no such thing as an uninformative prior–this was given up on over a decade ago.

I would never have come across an article in radiation oncology, if it were not for exchanges between members of a session I was in on “why we disagree” in statistical analysis in that field. I hereby invite all readers and the nearly 1000 registrants from our workshop to alert us throughout the year of interesting items under any of the stat activist banner.

Our letter: Bayesian Versus Frequentist Statistics: In Regard to Fornacon-Wood et al. (PDF of letter)

To the Editor:

We appreciate the authors bringing attention to controversies surrounding the use of Bayesian and frequentist statistics.1 [PDF of paper] There are many benefits to frequentist statistics and disadvantages of Bayesian statistics which were not discussed in the referenced article. We write this accompanying letter to aim for a more balanced presentation of Bayesian and frequentist statistics.

With frequentist statistical significance tests, we can learn whether the data indicate there is a genuine effect or difference in a statistical analysis, as they have the ability to control type I and type II error probabilities.2  Posteriors and Bayes factors do not ensure that the method rarely reports one treatment is better or worse than the other erroneously. A well-known threat to reliable results stems from the ease of using high powered methods to data-dredge and try to hunt for impressive-looking results that fail to replicate with new data. However, the Bayesian assessment is not altered by things like stopping rules-at least not without violating inference by Bayes theorem.3  The frequentist account,4  by contrast, is required to take account of such selection effects in reporting error probabilities. Another caution for those unfamiliar with practical Bayesian research is that estimation of a prior distribution is nontrivial. The priors they discuss are subjective degrees of belief, but there is considerable disagreement about which beliefs are warranted, even among experts. Furthermore, should conclusions differ if the prior is chosen by a radiation oncologist or a surgeon?5  These considerations are some of the reasons why most phase 3 studies in oncology rely on frequentist designs. The article equates frequentist methods with simple null hypothesis testing without alternatives, thereby overlooking hypothesis testing methods that control both type I and II errors. The frequentist takes account of type II errors and the corresponding notion of power. If a test has high power to detect a meaningful effect size, then failing to detect a statistically significant difference is evidence against a meaningful effect. Therefore, a  value that is not small is informative.

The authors write that frequentist methods do not use background information, but this is to ignore the field of experimental design and all of the work that goes into specifying the test (eg, sample size, statistical power) and critically evaluating the connection between statistical and substantive results. An effect that corresponds to a clinically meaningful effect, or effect sizes well warranted from previous studies, would clearly influence the design.

Although their article engenders important discussion, these differences between frequentist and Bayesian methods may help readers understand why so many researchers around the world still prefer the frequentist approach.

  • Amit K. Chowdhry, MD, PhD
    Department of Radiation Oncology
    University of Rochester Medical Center
    Rochester, New York
  • Deborah Mayo,
    Department of Philosophy
    Virginia Tech
    Blacksburg, Virginia
  • Stephanie L. Pugh, PhD
    NRG Oncology Statistical and Data Management Center
    American College of Radiology
    Philadelphia, Pennsylvania
  • John Park, MD
    Department of Radiation Oncology
    Kansas City VA Medical Center
    Kansas City, Missouri
  • Clifton David Fuller, MD,
    Department of Radiation Oncology
    MD Anderson Cancer Center
    Houston, Texas
  • John Kang, MD, PhD
    Department of Radiation Oncology
    University of Washington
    Seattle, Washington

References

  1. Fornacon-Wood I, Mistry H, Johnson-Hart C, et al. Understanding the differences between Bayesian and frequentist statistics. Int J Radiat Oncol Biol Phys 2022;112:1076-1082 .(PDF)
  2. Mayo DG. Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars. Cambridge, UK: Cambridge University Press; 2018.
  3. Ryan EG, Brock K, Gates S, Slade D. Do we need to adjust for interim analyses in a Bayesian adaptive trial design? BMC Med Res Methodol 2020;20:150.
  4. Jennison C, Turnbull BW. Group Sequential Methods With Applications to Clinical Trials. Boca Raton, FL: CRC Press; 1999.
  5. Staley K, Park J. Comment on Mayo’s “The statistics wars and intellectual conflicts of interest”. Conserv Biol 2022;36:e13861.

 

Fornacon-Wood Reply: In Reply to Chowdhry et al. (PDF of letter)

To the Editor:

We thank the authors for their response  to our “statistics for the people” article that aimed to introduce perhaps unfamiliar readers to Bayesian statistics and some potential advantages of their use. We agree that frequentist statistics are a useful and widespread statistical analytical approach, and we are not aiming to revisit the frequentist versus Bayesian arguments that have been well articulated in the literature.  However, there are a couple of points we would like to make.

First, we acknowledge that the majority of phase 3 studies use frequentist designs, and this has the advantage of facilitating meta-analyses using established techniques. However, we would argue that the reason such frequentist designs are so prevalent is likely to have as much to do with convention (from funders/regulators as well as from researchers themselves), the relative exposure of the 2 approaches in educational materials, and the historic difficulties in calculating Bayesian posteriors as it does with the arguments the authors make.

Second, although we agree with Chowdhry et al that there are many challenges associated with the estimation of prior probability distributions, we note that similar arguments apply to effect size estimation, which they cite as a strength of the Neyman-Pearson/null hypothesis significance testing approach (ie, the use of power calculations to limit the risk of type II errors).  We would also re-enforce the point we make in the article about the importance of testing the influence of the prior (represented as the divergent beliefs of the hypothetical radiation oncologist and surgeon in the communication by Chowdhry et al) in the analysis results. If the data are strong enough, the posterior distributions will be in close enough agreement to convince both parties. As we noted, it is also possible to undertake Bayesian analyses without prior information, using an uninformative prior, in which case the analysis is driven directly by the data, as for a frequentist calculation. As an aside, there is continued debate about the relative merits and deficiencies of the different frequentist approaches to significance testing, particularly around the widespread use of the hybrid Neyman-Pearson/null hypothesis significance testing approach.

********************************************************

Please share your constructive remarks in the comments to this post.

 

 

 

Categories: stat activist watch 2023, statistical significance tests | 2 Comments

Blog at WordPress.com.