On November 14, I gave a talk at the Seminar in Advanced Research Methods for the Department of Psychology, Princeton University.
“Statistical Inference as Severe Testing: Beyond Probabilism and Performance”
The video of my talk is below along with the slides. It reminds me to return to a paper, half-written, replying to a paper on “A Bayesian Perspective on Severity” (van Dongen, Sprenger, Wagenmakers (2022). These authors claim that Bayesians can satisfy severity “regardless of whether the test has been conducted in a severe or less severe fashion”, but what they mean is that data can be much more probable on hypothesis H1 than on H0 –the Bayes factor can be high. However, “severity” can be satisfied in their comparative (subjective) Bayesian sense even for claims that are poorly probed in the error statistical sense (slides 55-6). Share your comments.
ABSTRACT: I develop a statistical philosophy in which error probabilities of methods may be used to evaluate and control the stringency or severity of tests. A claim is severely tested to the extent it has been subjected to and passes a test that probably would have found flaws, were they present. The severe-testing requirement leads to reformulating statistical significance tests to avoid familiar criticisms and abuses. While high-profile failures of replication in the social and biological sciences stem from biasing selection effects—data dredging, multiple testing, optional stopping—some reforms and proposed alternatives to statistical significance tests conflict with the error control that is required to satisfy severity. I discuss recent arguments to redefine, abandon, or replace statistical significance.
Below is a video Princeton recorded of the talk. My slides are below that.
RECORDING OF TALK:
MY SLIDES:


