Author Archives: Mayo

SIST: All Excerpts and Mementos: May 2018-May 2019

view from a hot-air balloon

Introduction & Overview

The Meaning of My Title: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars* 05/19/18

Blurbs of 16 Tours: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (SIST) 03/05/19


Excursion 1


Tour I

Excursion 1 Tour I: Beyond Probabilism and Performance: Severity Requirement (1.1) 09/08/18

Excursion 1 Tour I (2nd stop): Probabilism, Performance, and Probativeness (1.2) 09/11/18

Excursion 1 Tour I (3rd stop): The Current State of Play in Statistical Foundations: A View From a Hot-Air Balloon (1.3) 09/15/18

Tour II

Excursion 1 Tour II: Error Probing Tools versus Logics of Evidence-Excerpt 04/04/19

Souvenir C: A Severe Tester’s Translation Guide (Excursion 1 Tour II) 11/08/18


Tour Guide Mementos (Excursion 1 Tour II of How to Get Beyond the Statistics Wars) 10/29/18


Excursion 2


Tour I

Excursion 2: Taboos of Induction and Falsification: Tour I (first stop) 09/29/18

“It should never be true, though it is still often said, that the conclusions are no more accurate than the data on which they are based” (Keepsake by Fisher, 2.1) 10/05/18

Tour II

Excursion 2 Tour II (3rd stop): Falsification, Pseudoscience, Induction (2.3) 10/10/18


Tour Guide Mementos and Quiz 2.1 (Excursion 2 Tour I Induction and Confirmation) 11/14/18

Mementos for Excursion 2 Tour II Falsification, Pseudoscience, Induction 11/17/18


Excursion 3


Tour I

Where are Fisher, Neyman, Pearson in 1919? Opening of Excursion 3 11/30/18

Neyman-Pearson Tests: An Episode in Anglo-Polish Collaboration: Excerpt from Excursion 3 (3.2) 12/01/18

First Look at N-P Methods as Severe Tests: Water plant accident [Exhibit (i) from Excursion 3] 12/04/18

Tour II

It’s the Methods, Stupid: Excerpt from Excursion 3 Tour II (Mayo 2018, CUP) 12/11/18

60 Years of Cox’s (1958) Chestnut: Excerpt from Excursion 3 tour II. 12/29/18

Tour III

Capability and Severity: Deeper Concepts: Excerpts From Excursion 3 Tour III 12/20/18


Memento & Quiz (on SEV): Excursion 3, Tour I 12/08/18

Mementos for “It’s the Methods, Stupid!” Excursion 3 Tour II (3.4-3.6) 12/13/18

Tour Guide Mementos From Excursion 3 Tour III: Capability and Severity: Deeper Concepts 12/26/18


Excursion 4


Tour I

Excerpt from Excursion 4 Tour I: The Myth of “The Myth of Objectivity” (Mayo 2018, CUP) 12/26/18

Tour II

Excerpt from Excursion 4 Tour II: 4.4 “Do P-Values Exaggerate the Evidence?” 01/10/19

Tour IV

Excerpt from Excursion 4 Tour IV: More Auditing: Objectivity and Model Checking 01/27/19


Mementos from Excursion 4: Blurbs of Tours I-IV 01/13/19


Excursion 5

Tour I

(full) Excerpt: Excursion 5 Tour I — Power: Pre-data and Post-data (from “SIST: How to Get Beyond the Stat Wars”) 04/27/19

Tour III

Deconstructing the Fisher-Neyman conflict wearing Fiducial glasses + Excerpt 5.8 from SIST 02/23/19


Excursion 6

Tour II

Excerpts: Souvenir Z: Understanding Tribal Warfare +  6.7 Farewell Keepsake from SIST + List of Souvenirs 05/04/19

*Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (Mayo, CUP 2018).

Categories: SIST, Statistical Inference as Severe Testing | Leave a comment

Excerpts: Final Souvenir Z, Farewell Keepsake & List of Souvenirs


We’ve reached our last Tour (of SIST)*: Pragmatic and Error Statistical Bayesians (Excursion 6), marking the end of our reading with Souvenir Z, the final Souvenir, as well as the Farewell Keepsake in 6.7. Our cruise ship Statinfasst, currently here at Thebes, will be back at dock for maintenance for our next launch at the Summer Seminar in Phil Stat (July 28-Aug 11). Although it’s not my preference that new readers being with the Farewell Keepsake (it contains a few spoilers), I’m excerpting it together with Souvenir Z (and a list of all souvenirs A – Z) here, and invite all interested readers to peer in. There’s a check list on p. 437: If you’re in the market for a new statistical account, you’ll want to test if it satisfies the items on the list. Have fun!

Souvenir Z: Understanding Tribal Warfare

We began this tour asking: Is there an overarching philosophy that “matches contemporary attitudes”? More important is changing attitudes. Not to encourage a switch of tribes, or even a tribal truce, but something more modest and actually achievable: to understand and get beyond the tribal warfare. To understand them, at minimum, requires grasping how the goals of probabilism differ from those of probativeness. This leads to a way of changing contemporary attitudes that is bolder and more challenging. Snapshots from the error statistical lens let you see how frequentist methods supply tools for controlling and assessing how well or poorly warranted claims are. All of the links, from data generation to modeling, to statistical inference and from there to substantive research claims, fall into place within this statistical philosophy. If this is close to being a useful way to interpret a cluster of methods, then the change in contemporary attitudes is radical: it has never been explicitly unveiled. Our journey was restricted to simple examples because those are the ones fought over in decades of statistical battles. Much more work is needed. Those grappling with applied problems are best suited to develop these ideas, and see where they may lead. I never promised,when you bought your ticket for this passage, to go beyond showing that viewing statistics as severe testing will let you get beyond the statistics wars.

6.7 Farewell Keepsake

Despite the eclecticism of statistical practice, conflicting views about the roles of probability and the nature of statistical inference – holdovers from long-standing frequentist–Bayesian battles – still simmer below the surface of today’s debates. Reluctance to reopen wounds from old battles has allowed them to fester. To assume all we need is an agreement on numbers – even if they’re measuring different things – leads to statistical schizophrenia. Rival conceptions of the nature of statistical inference show up unannounced in the problems of scientific integrity, irreproducibility, and questionable research practices, and in proposed methodological reforms. If you don’t understand the assumptions behind proposed reforms, their ramifications for statistical practice remain hidden from you.

Rival standards reflect a tension between using probability (a) to constrain the probability that a method avoids erroneously interpreting data in a series of applications (performance), and (b) to assign degrees of support, confirmation, or plausibility to hypotheses (probabilism). We set sail on our journey with an informal tool for telling what’s true about statistical inference: If little if anything has been done to rule out flaws in taking data as evidence for a claim, then that claim has not passed a severe test . From this minimal severe-testing requirement, we develop a statistical philosophy that goes beyond probabilism and performance. The goals of the severe tester (probativism) arise in contexts sufficiently different from those of probabilism that you are free to hold both, for distinct aims (Section 1.2). For statistical inference in science, it is severity we seek. A claim passes with severity only to the extent that it is subjected to, and passes, a test that it probably would have failed, if false. Viewing statistical inference as severe testing alters long-held conceptions of what’s required for an adequate account of statistical inference in science. In this view, a normative statistical epistemology –  an account of what’ s warranted to infer –  must be:

  directly altered by biasing selection effects
  able to falsify claims statistically
  able to test statistical model assumptions
  able to block inferences that violate minimal severity

These overlapping and interrelated requirements are disinterred over the course of our travels. This final keepsake collects a cluster of familiar criticisms of error statistical methods. They are not intended to replace the detailed arguments, pro and con, within; here we cut to the chase, generally keeping to the language of critics. Given our conception of evidence, we retain testing language even when the statistical inference is an estimation, prediction, or proposed answer to a question. The concept of severe testing is sufficiently general to apply to any of the methods now in use. It follows that a variety of statistical methods can serve to advance the severity goal, and that they can, in principle, find their foundations in an error statistical philosophy. However, each requires supplements and reformulations to be relevant to real-world learning. Good science does not turn on adopting any formal tool, and yet the statistics wars often focus on whether to use one type of test (or estimation, or model selection) or another. Meta-researchers charged with instigating reforms do not agree, but the foundational basis for the disagreement is left unattended. It is no wonder some see the statistics wars as proxy wars between competing tribe leaders, each keen to advance one or another tool, rather than about how to do better science. Leading minds are drawn into inconsequential battles, e.g., whether to use a prespecified cut-off  of 0.025 or 0.0025 –  when in fact good inference is not about cut-offs altogether but about a series of small-scale steps in collecting, modeling and analyzing data that work together to find things out. Still, we need to get beyond the statistics wars in their present form. By viewing a contentious battle in terms of a difference in goals –  finding highly probable versus highly well probed hypotheses – readers can see why leaders of rival tribes often talk past each other. To be clear, the standpoints underlying the following criticisms are open to debate; we’re far from claiming to do away with them. What should be done away with is rehearsing the same criticisms ad nauseum. Only then can we hear the voices of those calling for an honest standpoint about responsible science.

1. NHST Licenses Abuses. First, there’s the cluster of criticisms directed at an abusive NHST animal: NHSTs infer from a single P-value below an arbitrary cut-off to evidence for a research claim, and they encourage P-hacking, fishing, and other selection effects. The reply: this ignores crucial requirements set by Fisher and other founders: isolated significant results are poor evidence of a genuine effect and statistical significance doesn’t warrant substantive, (e.g., causal) inferences. Moreover, selective reporting invalidates error probabilities. Some argue significance tests are un-Popperian because the higher the sample size, the easier to infer one’s research hypothesis. It’s true that with a sufficiently high sample size any discrepancy from a null hypothesis has a high probability of being detected, but statistical significance does not license inferring a research claim H. Unless H’s errors have been well probed by merely finding a small P-value, H passes an extremely insevere test. No mountains out of molehills (Sections 4.3 and 5.1). Enlightened users of statistical tests have rejected the cookbook, dichotomous NHST, long lampooned: such criticisms are behind the times. When well-intentioned aims of replication research are linked to these retreads, it only hurts the cause. One doesn’t need a sharp dichotomy to identify rather lousy tests – a main goal for a severe tester. Granted, policy-making contexts may require cut-offs, as do behavioristic setups. But in those contexts, a test’s error probabilities measure overall error control, and are not generally used to assess well-testedness. Even there, users need not fall into the NHST traps (Section 2.5). While attention to banning terms is the least productive aspect of the statistics wars, since NHST is not used by Fisher or N-P, let’s give the caricature its due and drop the NHST acronym; “statistical tests” or “error statistical tests” will do. Simple significance tests are a small part of a conglomeration of error statistical methods.

To continue reading: Excerpt Souvenir Z, Farewell Keepsake & List of Souvenirs can be found here.

*We are reading Statistical Inference as Severe Testing: How to Get beyond the Statistics Wars (2018, CUP)



Where YOU are in the journey.


Categories: SIST, Statistical Inference as Severe Testing | Leave a comment

(full) Excerpt: Excursion 5 Tour I — Power: Pre-data and Post-data (from “SIST: How to Get Beyond the Stat Wars”)

S.S. StatInfasST

It’s a balmy day today on Ship StatInfasST: An invigorating wind has a salutary effect on our journey. So, for the first time I’m excerpting all of Excursion 5 Tour I (proofs) of Statistical Inference as Severe Testing How to Get Beyond the Statistics Wars (2018, CUP)

A salutary effect of power analysis is that it draws one forcibly to consider the magnitude of effects. In psychology, and especially in soft psychology, under the sway of the Fisherian scheme, there has been little consciousness of how big things are. (Cohen 1990, p. 1309)

 So how would you use power to consider the magnitude of effects were you drawn forcibly to do so? In with your breakfast is an exercise to get us started on today’ s shore excursion.

Suppose you are reading about a statistically signifi cant result x (just at level α ) from a one-sided test T+ of the mean of a Normal distribution with IID samples, and known σ: H0 : μ ≤ 0 against H1 : μ > 0. Underline the correct word, from the perspective of the (error statistical) philosophy, within which power is defined.

  • If the test’ s power to detect μ′ is very low (i.e., POW(μ′ ) is low), then the statistically significant x is poor/good evidence that μ > μ′ .
  • Were POW(μ′ ) reasonably high, the inference to μ > μ′ is reasonably/poorly warranted.

Continue reading

Categories: Statistical Inference as Severe Testing, Statistical power | Leave a comment

If you like Neyman’s confidence intervals then you like N-P tests


Neyman, confronted with unfortunate news would always say “too bad!” At the end of Jerzy Neyman’s birthday week, I cannot help imagining him saying “too bad!” as regards some twists and turns in the statistics wars. First, too bad Neyman-Pearson (N-P) tests aren’t in the ASA Statement (2016) on P-values: “To keep the statement reasonably simple, we did not address alternative hypotheses, error types, or power”. An especially aggrieved “too bad!” would be earned by the fact that those in love with confidence interval estimators don’t appreciate that Neyman developed them (in 1930) as a method with a precise interrelationship with N-P tests. So if you love CI estimators, then you love N-P tests! Continue reading

Categories: ASA Guide to P-values, CIs and tests, Neyman | Leave a comment

Neyman: Distinguishing tests of statistical hypotheses and tests of significance might have been a lapse of someone’s pen

Neyman April 16, 1894 – August 5, 1981

I’ll continue to post Neyman-related items this week in honor of his birthday. This isn’t the only paper in which Neyman makes it clear he denies a distinction between a test of  statistical hypotheses and significance tests. He and E. Pearson also discredit the myth that the former is only allowed to report pre-data, fixed error probabilities, and are justified only by dint of long-run error control. Controlling the “frequency of misdirected activities” in the midst of finding something out, or solving a problem of inquiry, on the other hand, are epistemological goals. What do you think?

Tests of Statistical Hypotheses and Their Use in Studies of Natural Phenomena
by Jerzy Neyman

ABSTRACT. Contrary to ideas suggested by the title of the conference at which the present paper was presented, the author is not aware of a conceptual difference between a “test of a statistical hypothesis” and a “test of significance” and uses these terms interchangeably. A study of any serious substantive problem involves a sequence of incidents at which one is forced to pause and consider what to do next. In an effort to reduce the frequency of misdirected activities one uses statistical tests. The procedure is illustrated on two examples: (i) Le Cam’s (and associates’) study of immunotherapy of cancer and (ii) a socio-economic experiment relating to low-income homeownership problems.

I recommend, especially, the example on home ownership. Here are two snippets: Continue reading

Categories: Error Statistics, Neyman, Statistics | Tags: | Leave a comment

Neyman vs the ‘Inferential’ Probabilists


We celebrated Jerzy Neyman’s Birthday (April 16, 1894) last night in our seminar: here’s a pic of the cake.  My entry today is a brief excerpt and a link to a paper of his that we haven’t discussed much on this blog: Neyman, J. (1962), ‘Two Breakthroughs in the Theory of Statistical Decision Making‘ [i] It’s chock full of ideas and arguments, but the one that interests me at the moment is Neyman’s conception of “his breakthrough”, in relation to a certain concept of “inference”.  “In the present paper” he tells us, “the term ‘inferential theory’…will be used to describe the attempts to solve the Bayes’ problem with a reference to confidence, beliefs, etc., through some supplementation …either a substitute a priori distribution [exemplified by the so called principle of insufficient reason] or a new measure of uncertainty” such as Fisher’s fiducial probability. So if you hear Neyman rejecting “inferential accounts” you have to understand it in this very specific way: he’s rejecting “new measures of confidence or diffidence”. Here he alludes to them as “easy ways out”. Now Neyman always distinguishes his error statistical performance conception from Bayesian and Fiducial probabilisms [ii]. The surprising twist here is semantical and the culprit is none other than…Allan Birnbaum. Yet Birnbaum gets short shrift, and no mention is made of our favorite “breakthrough” (or did I miss it?).

drawn by his wife,Olga

Note: In this article,”attacks” on various statistical “fronts” refers to ways of attacking problems in one or another statistical research program.

Categories: Bayesian/frequentist, Error Statistics, Neyman | Leave a comment

Jerzy Neyman and “Les Miserables Citations” (statistical theater in honor of his birthday yesterday)


Neyman April 16, 1894 – August 5, 1981

My second Jerzy Neyman item, in honor of his birthday, is a little play that I wrote for Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018):

A local acting group is putting on a short theater production based on a screenplay I wrote:  “Les Miserables Citations” (“Those Miserable Quotes”) [1]. The “miserable” citations are those everyone loves to cite, from their early joint 1933 paper:

We are inclined to think that as far as a particular hypothesis is concerned, no test based upon the theory of probability can by itself provide any valuable evidence of the truth or falsehood of that hypothesis.

But we may look at the purpose of tests from another viewpoint. Without hoping to know whether each separate hypothesis is true or false, we may search for rules to govern our behavior with regard to them, in following which we insure that, in the long run of experience, we shall not be too often wrong. (Neyman and Pearson 1933, pp. 290-1).

Continue reading

Categories: E.S. Pearson, Neyman, Statistics | Leave a comment

A. Spanos: Jerzy Neyman and his Enduring Legacy

Today is Jerzy Neyman’s birthday. I’ll post various Neyman items this week in recognition of it, starting with a guest post by Aris Spanos. Happy Birthday Neyman!

A. Spanos

A Statistical Model as a Chance Mechanism
Aris Spanos 

Jerzy Neyman (April 16, 1894 – August 5, 1981), was a Polish/American statistician[i] who spent most of his professional career at the University of California, Berkeley. Neyman is best known in statistics for his pioneering contributions in framing the Neyman-Pearson (N-P) optimal theory of hypothesis testing and his theory of Confidence Intervals. (This article was first posted here.)

Neyman: 16 April

Neyman: 16 April 1894 – 5 Aug 1981

One of Neyman’s most remarkable, but least recognized, achievements was his adapting of Fisher’s (1922) notion of a statistical model to render it pertinent for  non-random samples. Fisher’s original parametric statistical model Mθ(x) was based on the idea of ‘a hypothetical infinite population’, chosen so as to ensure that the observed data x0:=(x1,x2,…,xn) can be viewed as a ‘truly representative sample’ from that ‘population’: Continue reading

Categories: Neyman, Spanos | Leave a comment

Several reviews of Deborah Mayo’s new book, Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars « Statistical Modeling, Causal Inference, and Social Science

Source: Several reviews of Deborah Mayo’s new book, Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars « Statistical Modeling, Causal Inference, and Social Science

Categories: Error Statistics | Leave a comment

Excursion 1 Tour II: Error Probing Tools versus Logics of Evidence-Excerpt


For the first time, I’m excerpting all of Excursion 1 Tour II from SIST (2018, CUP).

1.4 The Law of Likelihood and Error Statistics

If you want to understand what’s true about statistical inference, you should begin with what has long been a holy grail–to use probability to arrive at a type of logic of evidential support–and in the first instance you should look not at full-blown Bayesian probabilism, but at comparative accounts that sidestep prior probabilities in hypotheses. An intuitively plausible logic of comparative support was given by the philosopher Ian Hacking (1965)–the Law of Likelihood. Fortunately, the Museum of Statistics is organized by theme, and the Law of Likelihood and the related Likelihood Principle is a big one. Continue reading

Categories: Error Statistics, law of likelihood, SIST | 2 Comments

there’s a man at the wheel in your brain & he’s telling you what you’re allowed to say (not probability, not likelihood)

It seems like every week something of excitement in statistics comes down the pike. Last week I was contacted by Richard Harris (and 2 others) about the recommendation to stop saying the data reach “significance level p” but rather simply say

“the p-value is p”.

(For links, see my previous post.) Friday, he wrote to ask if I would comment on a proposed restriction (?) on saying a test had high power! I agreed that we shouldn’t say a test has high power, but only that it has a high power to detect a specific alternative, but I wasn’t aware of any rulings from those in power on power. He explained it was an upshot of a reexamination by a joint group of the boards of statistical associations in the U.S. and UK. of the full panoply of statistical terms. Something like that. I agreed to speak with him yesterday. He emailed me the proposed ruling on power: Continue reading

Categories: Bayesian/frequentist | 5 Comments

Diary For Statistical War Correspondents on the Latest Ban on Speech

When science writers, especially “statistical war correspondents”, contact you to weigh in on some article, they may talk to you until they get something spicy, and then they may or may not include the background context. So a few writers contacted me this past week regarding this article (“Retire Statistical Significance”)–a teaser, I now suppose, to advertise the ASA collection growing out of that conference “A world beyond P ≤ .05” way back in Oct 2017, where I gave a paper*. I jotted down some points, since Richard Harris from NPR needed them immediately, and I had just gotten off a plane when he emailed. He let me follow up with him, which is rare and greatly appreciated. So I streamlined the first set of points, and dropped any points he deemed technical. I sketched the third set for a couple of other journals who contacted me, who may or may not use them. Here’s Harris’ article, which includes a couple of my remarks. Continue reading

Categories: ASA Guide to P-values, P-values | 40 Comments

1 Days to Apply for the Summer Seminar in Phil Stat

Go to the website for instructions:

Categories: Summer Seminar in PhilStat | 1 Comment

S. Senn: To infinity and beyond: how big are your data, really? (guest post)



Stephen Senn
Consultant Statistician

What is this you boast about?

Failure to understand components of variation is the source of much mischief. It can lead researchers to overlook that they can be rich in data-points but poor in information. The important thing is always to understand what varies in the data you have, and to what extent your design, and the purpose you have in mind, master it. The result of failing to understand this can be that you mistakenly calculate standard errors of your estimates that are too small because you divide the variance by an n that is too big. In fact, the problems can go further than this, since you may even pick up the wrong covariance and hence use inappropriate regression coefficients to adjust your estimates.

I shall illustrate this point using clinical trials in asthma. Continue reading

Categories: Lord's paradox, S. Senn | 5 Comments

Blurbs of 16 Tours: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (SIST)

Statistical Inference as Severe Testing:
How to Get Beyond the Statistics Wars (2018, CUP)

Deborah G. Mayo

Abstract for Book

By disinterring the underlying statistical philosophies this book sets the stage for understanding and finally getting beyond today’s most pressing controversies revolving around statistical methods and irreproducible findings. Statistical Inference as Severe Testing takes the reader on a journey that provides a non-technical “how to” guide for zeroing in on the most influential arguments surrounding commonly used–and abused– statistical methods. The book sets sail with a tool for telling what’s true about statistical controversies: If little if anything has been done to rule out flaws in taking data as evidence for a claim, then that claim has not passed a stringent or severe test. In the severe testing account, probability arises in inference, not to measure degrees of plausibility or belief in hypotheses, but to assess and control how severely tested claims are. Viewing statistical inference as severe testing supplies novel solutions to problems of induction, falsification and demarcating science from pseudoscience, and serves as the linchpin for understanding and getting beyond the statistics wars. The book links philosophical questions about the roles of probability in inference to the concerns of practitioners in psychology, medicine, biology, economics, physics and across the landscape of the natural and social sciences.

Keywords for book:

Severe testing, Bayesian and frequentist debates, Philosophy of statistics, Significance testing controversy, statistics wars, replication crisis, statistical inference, error statistics, Philosophy and history of Neyman, Pearson and Fisherian statistics, Popperian falsification

Continue reading

Categories: Statistical Inference as Severe Testing | 2 Comments

Deconstructing the Fisher-Neyman conflict wearing fiducial glasses + Excerpt 5.8 from SIST


Fisher/ Neyman

This continues my previous post: “Can’t take the fiducial out of Fisher…” in recognition of Fisher’s birthday, February 17. These 2 posts reflect my working out of these ideas in writing Section 5.8 of Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (SIST, CUP 2018). Here’s all of Section 5.8 (“Neyman’s Performance and Fisher’s Fiducial Probability”) for your Saturday night reading.* 

Move up 20 years to the famous 1955/56 exchange between Fisher and Neyman. Fisher clearly connects Neyman’s adoption of a behavioristic-performance formulation to his denying the soundness of fiducial inference. When “Neyman denies the existence of inductive reasoning, he is merely expressing a verbal preference. For him ‘reasoning’ means what ‘deductive reasoning’ means to others.” (Fisher 1955, p. 74). Continue reading

Categories: fiducial probability, Fisher, Neyman, Statistics | 2 Comments

Can’t Take the Fiducial Out of Fisher (if you want to understand the N-P performance philosophy) [i]


R.A. Fisher: February 17, 1890 – July 29, 1962

Continuing with posts in recognition of R.A. Fisher’s birthday, I post one from a few years ago on a topic that had previously not been discussed on this blog: Fisher’s fiducial probability

[Neyman and Pearson] “began an influential collaboration initially designed primarily, it would seem to clarify Fisher’s writing. This led to their theory of testing hypotheses and to Neyman’s development of confidence intervals, aiming to clarify Fisher’s idea of fiducial intervals (D.R.Cox, 2006, p. 195).

The entire episode of fiducial probability is fraught with minefields. Many say it was Fisher’s biggest blunder; others suggest it still hasn’t been understood. The majority of discussions omit the side trip to the Fiducial Forest altogether, finding the surrounding brambles too thorny to penetrate. Besides, a fascinating narrative about the Fisher-Neyman-Pearson divide has managed to bloom and grow while steering clear of fiducial probability–never mind that it remained a centerpiece of Fisher’s statistical philosophy. I now think that this is a mistake. It was thought, following Lehmann (1993) and others, that we could take the fiducial out of Fisher and still understand the core of the Neyman-Pearson vs Fisher (or Neyman vs Fisher) disagreements. We can’t. Quite aside from the intrinsic interest in correcting the “he said/he said” of these statisticians, the issue is intimately bound up with the current (flawed) consensus view of frequentist error statistics. Continue reading

Categories: fiducial probability, Fisher, Phil6334/ Econ 6614, Statistics | Leave a comment

Guest Blog: R. A. Fisher: How an Outsider Revolutionized Statistics (Aris Spanos)



In recognition of R.A. Fisher’s birthday on February 17…a week of Fisher posts!

‘R. A. Fisher: How an Outsider Revolutionized Statistics’

by Aris Spanos

Few statisticians will dispute that R. A. Fisher (February 17, 1890 – July 29, 1962) is the father of modern statistics; see Savage (1976), Rao (1992). Inspired by William Gosset’s (1908) paper on the Student’s t finite sampling distribution, he recast statistics into the modern model-based induction in a series of papers in the early 1920s. He put forward a theory of optimal estimation based on the method of maximum likelihood that has changed only marginally over the last century. His significance testing, spearheaded by the p-value, provided the basis for the Neyman-Pearson theory of optimal testing in the early 1930s. According to Hald (1998)

“Fisher was a genius who almost single-handedly created the foundations for modern statistical science, without detailed study of his predecessors. When young he was ignorant not only of the Continental contributions but even of contemporary publications in English.” (p. 738)

What is not so well known is that Fisher was the ultimate outsider when he brought about this change of paradigms in statistical science. As an undergraduate, he studied mathematics at Cambridge, and then did graduate work in statistical mechanics and quantum theory. His meager knowledge of statistics came from his study of astronomy; see Box (1978). That, however did not stop him from publishing his first paper in statistics in 1912 (still an undergraduate) on “curve fitting”, questioning Karl Pearson’s method of moments and proposing a new method that was eventually to become the likelihood method in his 1921 paper. Continue reading

Categories: Fisher, phil/history of stat, Phil6334/ Econ 6614, Spanos, Statistics | 2 Comments

R.A. Fisher: “Statistical methods and Scientific Induction”

I continue a week of Fisherian posts begun on his birthday (Feb 17). This is his contribution to the “Triad”–an exchange between  Fisher, Neyman and Pearson 20 years after the Fisher-Neyman break-up. The other two are below. They are each very short and are worth your rereading.

17 February 1890 — 29 July 1962

“Statistical Methods and Scientific Induction”

by Sir Ronald Fisher (1955)


The attempt to reinterpret the common tests of significance used in scientific research as though they constituted some kind of  acceptance procedure and led to “decisions” in Wald’s sense, originated in several misapprehensions and has led, apparently, to several more.

The three phrases examined here, with a view to elucidating they fallacies they embody, are:

  1. “Repeated sampling from the same population”,
  2. Errors of the “second kind”,
  3. “Inductive behavior”.

Mathematicians without personal contact with the Natural Sciences have often been misled by such phrases. The errors to which they lead are not only numerical.

To continue reading Fisher’s paper.


Note on an Article by Sir Ronald Fisher

by Jerzy Neyman (1956)




(1) FISHER’S allegation that, contrary to some passages in the introduction and on the cover of the book by Wald, this book does not really deal with experimental design is unfounded. In actual fact, the book is permeated with problems of experimentation.  (2) Without consideration of hypotheses alternative to the one under test and without the study of probabilities of the two kinds, no purely probabilistic theory of tests is possible. Continue reading

Categories: E.S. Pearson, fiducial probability, Fisher, Neyman, phil/history of stat, Phil6334/ Econ 6614 | 1 Comment

Guest Post: STEPHEN SENN: ‘Fisher’s alternative to the alternative’

“You May Believe You Are a Bayesian But You Are Probably Wrong”


As part of the week of posts on R.A.Fisher (February 17, 1890 – July 29, 1962), I reblog a guest post by Stephen Senn from 2012, and 2017. See especially the comments from Feb 2017. 

‘Fisher’s alternative to the alternative’

By: Stephen Senn

[2012 marked] the 50th anniversary of RA Fisher’s death. It is a good excuse, I think, to draw attention to an aspect of his philosophy of significance testing. In his extremely interesting essay on Fisher, Jimmie Savage drew attention to a problem in Fisher’s approach to testing. In describing Fisher’s aversion to power functions Savage writes, ‘Fisher says that some tests are more sensitive than others, and I cannot help suspecting that that comes to very much the same thing as thinking about the power function.’ (Savage 1976) (P473).

The modern statistician, however, has an advantage here denied to Savage. Savage’s essay was published posthumously in 1976 and the lecture on which it was based was given in Detroit on 29 December 1971 (P441). At that time Fisher’s scientific correspondence did not form part of his available oeuvre but in 1990 Henry Bennett’s magnificent edition of Fisher’s statistical correspondence (Bennett 1990) was published and this throws light on many aspects of Fisher’s thought including on significance tests. Continue reading

Categories: Fisher, S. Senn, Statistics | Leave a comment

Blog at