Monthly Archives: November 2025

First Look at N-P Methods as Severe Tests: Water plant accident [Exhibit (i) from Excursion 3]

November Cruise

The example I use here to illustrate formal severity comes in for criticism  in a paper to which I reply in a 2025 BJPS paper linked to here. Use the comments for queries.

Exhibit (i) N-P Methods as Severe Tests: First Look (Water Plant Accident) 

There’s been an accident at a water plant where our ship is docked, and the cooling system had to be repaired.  It is meant to ensure that the mean temperature of discharged water stays below the temperature that threatens the ecosystem, perhaps not much beyond 150 degrees Fahrenheit. There were 100 water measurements taken at randomly selected times and the sample mean x computed, each with a known standard deviation σ = 10.  When the cooling system is effective, each measurement is like observing X ~ N(150, 102). Because of this variability, we expect different 100-fold water samples to lead to different values of X, but we can deduce its distribution. If each X ~N(μ = 150, 102) then X is also Normal with μ = 150, but the standard deviation of X is only σ/√n = 10/√100 = 1. So X ~ N(μ = 150, 1). Continue reading

Categories: 2025 leisurely cruise, severe tests, severity function, water plant accident | Leave a comment

Neyman-Pearson Tests: An Episode in Anglo-Polish Collaboration: (3.2)

Neyman & Pearson

November Cruise: 3.2

This third of November’s stops in the leisurely cruise of SIST aligns well with my recent BJPS paper Severe Testing: Error Statistics vs Bayes Factor Tests.  In tomorrow’s zoom, 11 am New York time, we’ll have an overview of the topics in SIST so far, as well as a discussion of this paper. (If you don’t have a link, and want one, write to me at error@vt.edu). 

3.2 N-P Tests: An Episode in Anglo-Polish Collaboration*

We proceed by setting up a specific hypothesis to test, Hin Neyman’s and my terminology, the null hypothesis in R. A. Fisher’s . . . in choosing the test, we take into account alternatives to Hwhich we believe possible or at any rate consider it most important to be on the look out for . . .Three steps in constructing the test may be defined: Continue reading

Categories: 2024 Leisurely Cruise, E.S. Pearson, Neyman, statistical tests | Leave a comment

Where Are Fisher, Neyman, Pearson in 1919? Opening of Excursion 3, snippets from 3.1

November Cruise

This second excerpt for November is really just the preface to 3.1. Remember, our abbreviated cruise this fall is based on my LSE Seminars in 2020, and since there are only 5, I had to cut. So those seminars skipped 3.1 on the eclipse tests of GTR. But I want to share snippets from 3.1 with current readers, along with reflections in the comments.

Excursion 3 Statistical Tests and Scientific Inference

Tour I Ingenious and Severe Tests

[T]he impressive thing about [the 1919 tests of Einstein’s theory of gravity] is the risk involved in a prediction of this kind. If observation shows that the predicted effect is definitely absent, then the theory is simply refuted.The theory is incompatible with certain possible results of observation – in fact with results which everybody before Einstein would have expected. This is quite different from the situation I have previously described, [where] . . . it was practically impossible to describe any human behavior that might not be claimed to be a verification of these [psychological] theories. (Popper 1962, p. 36)

Continue reading

Categories: 2025 leisurely cruise, SIST, Statistical Inference as Severe Testing | 2 Comments

November: The leisurely tour of SIST continues

2025 Cruise

We continue our leisurely tour of Statistical Inference as Severe Testing [SIST] (Mayo 2018, CUP) with Excursion 3. This is based on my 5 seminars at the London School of Economics in 2020; I include slides and video for those who are interested. (use the comments for questions) Continue reading

Categories: 2025 leisurely cruise, significance tests, Statistical Inference as Severe Testing | 1 Comment

Severity and Adversarial Collaborations (i)

.

In the 2025 November/December issue of American Scientist, a group of authors (Ceci, Clark, Jussim and Williams 2025) argue in “Teams of rivals” that “adversarial collaborations offer a rigorous way to resolve opposing scientific findings, inform key sociopolitical issues, and help repair trust in science”. With adversarial collaborations, a term coined by Daniel Kahneman (2003), teams of divergent scholars, interested in uncovering what is the case (rather than endlessly making their case) design appropriately stringent tests to understand–and perhaps even resolve–their disagreements. I am pleased to see that in describing such tests the authors allude to my notion of severe testing (Mayo 2018)*:

Severe testing is the related idea that the scientific community ought to accept a claim only after it surmounts rigorous tests designed to find its flaws, rather than tests optimally designed for confirmation. The strong motivation each side’s members will feel to severely test the other side’s predictions should inspire greater confidence in the collaboration’s eventual conclusions. (Ceci et al., 2025)

1. Why open science isn’t enough Continue reading

Categories: severity and adversarial collaborations | 5 Comments

Blog at WordPress.com.