Monthly Archives: June 2026

Announcement: CFP Synthese Topical Collection: Severity and Learning from Error

Posted on June 23, 2026 by Mayo

I hope that many readers of this blog will consider contributing to this!

ANNOUNCEMENT SEV26

Synthese Topical Collection CFP: Severity and Learning from Error

This Topical Collection examines how inquiry learns from error by focusing on a basic principle of evidence in science, statistics, medicine, law, epistemology, and day-to-day learning: a claim is not well-tested, known or epistemically warranted, if it is based on a method that makes it easy to accept, conclude or infer the claim, even if it is false. Such a claim may accord well with the data, but it has not passed a stringent or severe test. While this overarching intuition is widely shared, the problem of how to understand or satisfy it remains unsolved. C. S. Peirce emphasizes randomization and (what is now called) pre-designation to achieve self-correcting methods. Popper viewed severity in terms of satisfying novel predictive success and surviving stringent attempts at falsification. Deborah Mayo (1996, 2018) combines elements from Popper and Peirce with the use of error probabilities from statistical methods: proposed solutions to problems earn warrant by surviving probes that were capable of showing them wrong or inadequate. This Topical Collection takes “severity” to be a broad meta-level concept according to which a claim – whether a report of a perception, a prediction, a hypothesis, or part of a model – is assessed according to whether, and how readily, its errors and inadequacies would have been found, if present.

Several questions arise: What errors matter for a given aim? What would it take for a method to be capable of detecting them? How in actual practice can inquirers show they have engaged in responsible error probing when there are no formal probability models? Addressing questions like these is of urgent importance today as we face high-powered methods that make it easy to find impressive looking effects that are spurious and non-replicating, or to arrive at well-fitting models that do not predict well, do not replicate, or do not provide substantive scientific understanding. These questions arise in debates about methodological shifts in AI/ML, randomized clinical trials, legal evidence, climate modeling, statistical inference, and error-prone inference in general. We seek to bring these metascience debates into direct contact and to ask what is often left hidden: What errors are now being controlled, and which have quietly dropped out of view? By bringing together philosophers, statisticians, and scientists, we aim to develop a shared set of problems and tools with a forward-looking goal: to shape emerging practices, rather than merely react to them with retrospective commentary.

We welcome submissions on any topic that broadly relates to severity or learning from error. We invite contributions that develop, apply or challenge severity-based reasoning, or that develop alternative approaches, Bayesian, frequentist, machine-learning and other, which engage the same underlying concern: how inquiry learns from error, and how claims earn warrant by surviving probes that were capable of showing them wrong or inadequate. We encourage contributions that explore connections between concepts of severity in different fields. Notably, the concepts of sensitivity and safety in contemporary epistemology can be understood through the lens of severity, and both are redolent of stability in AI. We also welcome discussions of how contemporary manifestations of severity interrelate with the traditional notions of severity from Popper and Peirce, and how concepts of severity may help in tackling fundamental problems of induction, falsification, underdetermination, and realism in philosophy of science.

The collection is partly motivated by the thirtieth anniversary of Deborah Mayo’s (1996, Chicago) Error and the Growth of Experimental Knowledge (Lakatos Prize 1998) and the development of its account of severe testing.

Appropriate Topics for Submission include, among others:

Severity and philosophy of statistics

Do recent controversies about the uses of error probabilities in statistics (and metastatistics) present a challenge to severity-based reasoning?

Do the new fields of post-selection inference (in AI and other disciplines) allow for error control despite data-driven constructions? Or do they shift attention to different errors?
How does severity link to such notions as calibration, security, and stability, and statistical techniques that promote such notions as robustness analyses, and multiverse analyses?

Severity and philosophy of science

What does it mean for a method, or for science itself, to be self-correcting or error-correcting? Does it fit best with a pragmatist philosophy?
How does severe probing take place in the historical sciences, e.g., climate science, geology? Can claims be well probed without being replicable?
Rather than probing for falsity, how can we severely probe if a model is adequate for a purpose or problem of interest?

Severity and contemporary epistemology

Can a useful cross-cutting epistemology that links science, statistics, and applied epistemology be built around the concept of severity?
Do features of severity (e.g., auditing of assumptions) point to ways to avoid problems of sensitivity and safety in epistemology?
Does requiring severity explain why legal epistemology resists mere base-rates and “naked statistics”? Does it solve proof paradoxes in legal epistemology?

Tracking shifts in error control

How does AI/ML shift from modeling data-generating mechanisms in statistics to optimizing predictive performance in machine learning.
How do changing guidelines for RCTs shift trials from probing biological mechanisms to predicting average treatment effects over a population?
What are the social, epistemic, ethical, and political consequences of shifting regimens of error control?

The value of probing error

How can adversarial collaborations and stress-testing advance science?
How can error repertoires be built and effectively employed to facilitate severity in measurement and experiment?
How does learning from error enter outside science (e.g., in art, architecture and life drawing)?

Submissions via: https://www.editorialmanager.com/synt/default.aspx

Under the drop-down menu, select Severity and Learning from Error.

Submitted papers will undergo the usual Synthese review process.

For further information, please contact the guest editors:

mayod@vt.edu, wendyparker@vt.edu, D.Lakens@tue.nl, staleykw@gmail.com.

The deadline for submissions is the 15th of December, 2026 (with possible short extensions). Use the comments or write to me with your ideas and questions with the subject: SEV26. The website announcement is here: https://link.springer.com/collections/ebjdhfadcd

Categories: Error Statistics, SEV 26, severity | Leave a comment

‘Low power’ and an all too standard error (continuation of “don’t turn power on its head”)

Posted on June 7, 2026 by Mayo

“In my opinion, a great deal of confusion about statistics can be traced to the fact that the point estimate is seen as being the be all and end all, the expression of uncertainty being forgotten….to provide a point estimate without also providing a standard error is, indeed, an all too standard error.”

Stephen Senn: “Error point: the importance of knowing how much you don’t know”

In my previous blogpost, (“How not to turn power on its head”), I argued, in relation to a one-sided test of mean μ (e.g., H₀: µ ≤ 0 vs H₁: µ > 0 with known SE):

If POW(μ′) is high (e.g., over .5), then a just significant result is poor evidence that μ > μ′; while if POW(μ′) is low (e.g., less than .2), it is good evidence that μ > μ′ where μ′ is a value greater than 0 (provided assumptions for these claims hold approximately).

By a “just statistically significant result” I mean one that just makes it to the threshold for statistical significance, write it as M* (my last post used D*). The reasoning is essentially this: Because it’s very improbable to obtain as low a P-value as we did, were μ as small as μ′—that is, because POW(μ′) is low—the result indicates we are in a world where μ is greater than μ′. This is exactly the reasoning that allows us to infer μ > 0 with a statistically significant result. Indeed, the power of the test against μ₀ is α. It is supposed that the statistical assumptions needed for the error probabilities to apply hold adequately.

Why then do we often hear that low power is associated with “exaggerated” or “inflated” effects? As we reasoned in the previous post, low power against μ′ strengthens the inference that μ exceeds μ′. Can the same feature—low power—also be associated with overestimation? The answer is, yes it can, but only one of the claims corresponds to a correct application of statistical significance tests.

More specifically, the overestimation charge stems from supposing the observed result M* is taken as a (point) estimate of the population mean (i.e., estimating μ = M*, without providing the SE)–an unkosher (but not so uncommon) move–and then considering a value μ′ against which the test has low power. Since M* is the just-significant cutoff, clearly M* will exceed μ′ (at least in a good test). So if the true population mean takes a value against which the test has low power, and M* is taken as a point estimate of μ, the result will be to “overestimate” the population mean. While the true value is unknown, this if-then claim is correct. Likewise, if the power to detect the true μ is high, the observed M*, will underestimate μ–if M* is used as a point estimate.

To clarify these points, it helps to contrast two different questions that are often run together:

Does the observed (just) statistically significant result M* warrant inferring μ > μ′ (when POW(μ′) is low)?
Does the observed (just) statistically significant result M* exceed μ′ (when POW(μ′) is low)?

The answer to both questions is yes. The very fact invoked to show that M* exceeds μ′—yielding a “yes” answer to #2–namely, that a result at least as large as M* would be improbable were μ = μ′—is precisely what warrants inferring that μ > μ′–yielding a “yes” answer to #1.

However common it may be to equate the observed statistically significant result M* with the population mean, that is not a warranted inference from a significance test. For one thing, significance test inferences are inequalities, not point claims or point estimates. A statistically significant result warrants inferring μ > μ₀ and, more generally, warrants inferring μ > μ′ for values μ′ against which the test has sufficiently low power–although it is not typically put that way. It would more typically be put in terms of the p-value reached in relation to a discrepancy from H₀. (We would get a p-value function over different discrepancies.) What is the p-value were we testing H₀: µ ≤ M*, and observed our just significant result M*? Answer: .5. Thus, to take M* as warranting µ > M*, would be to follow a method that is wrong 50% of the time. (See mountains out of molehill fallacy in SIST.)

There is, of course, a relation between tests and estimation. Rejecting H₀ (at level α) is equivalent to inferring that μ exceeds the corresponding lower confidence bound (at level 1- α, for the 1-sided case). Obtaining this lower bound requires subtracting a number of SEs (e.g., 1.5, 1.65, 1.96, 2) from M*.

Observe that POW(μ₀) = α and POW(M*) = .5. We can relate the consequence of μ′ moving farther below M*:

As μ′ moves farther below M*	Consequence
M* − μ′ increases	Greater overestimation if the observed M* is used to estimate μ
POW(μ′) decreases	The probability of obtaining M ≥ M* under μ = μ′ decreases
P-value for μ = μ′ decreases	Stronger evidence that μ > μ′

Thus, as power against μ′ decreases, the amount by which M* exceeds μ′ increases, but so too does the evidence that μ exceeds μ′. The very circumstance that yields greater overestimation when M* is used to estimate μ yields stronger evidence that μ exceeds μ′.

One final point. If a testing procedure is selectively reporting only statistically significant results, then the original error probabilities no longer apply–whether to the test or equivalent CI estimation.

Share your queries and thoughts in the comments to this post.

For a related post see “Do underpowered tests exaggerate population effects?”

See also the discussion on pp. 359-361 of Mayo (2018, CUP): Statistical Inference as Severe Testing: How to get beyond the statistics wars? (SIST). The relevant excerpt can be found here.

Categories: power, reforming the reformers | Leave a comment

Monthly Archives: June 2026

Announcement: CFP Synthese Topical Collection: Severity and Learning from Error

ANNOUNCEMENT SEV26

Synthese Topical Collection CFP: Severity and Learning from Error

‘Low power’ and an all too standard error (continuation of “don’t turn power on its head”)

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.

Monthly Archives: June 2026

Announcement: CFP Synthese Topical Collection: Severity and Learning from Error

ANNOUNCEMENT SEV26 Synthese Topical Collection CFP: Severity and Learning from Error

‘Low power’ and an all too standard error (continuation of “don’t turn power on its head”)

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.

ANNOUNCEMENT SEV26

Synthese Topical Collection CFP: Severity and Learning from Error