Guest Blog: ARIS SPANOS: The Enduring Legacy of R. A. Fisher

By Aris Spanos

One of R. A. Fisher’s (17 February 1890 — 29 July 1962) most re­markable, but least recognized, achievement was to initiate the recast­ing of statistical induction. Fisher (1922) pioneered modern frequentist statistics as a model-based approach to statistical induction anchored on the notion of a statistical model, formalized by:

Mθ(x)={f(x;θ); θ∈Θ}; x∈Rn ;Θ⊂Rm; m < n; (1)

where the distribution of the sample f(x;θ) ‘encapsulates’ the proba­bilistic information in the statistical model.

Before Fisher, the notion of a statistical model was vague and often implicit, and its role was primarily confined to the description of the distributional features of the data in hand using the histogram and the first few sample moments; implicitly imposing random (IID) samples. The problem was that statisticians at the time would use descriptive summaries of the data to claim generality beyond the data in hand x0:=(x1,x2,…,xn) As late as the 1920s, the problem of statistical induction was understood by Karl Pearson in terms of invoking (i) the ‘stability’ of empirical results for subsequent samples and (ii) a prior distribution for θ.

Fisher was able to recast statistical inference by turning Karl Pear­son’s approach, proceeding from data x0 in search of a frequency curve f(x;ϑ) to describe its histogram, on its head. He proposed to begin with a prespecified Mθ(x) (a ‘hypothetical infinite population’), and view x0 as a ‘typical’ realization thereof; see Spanos (1999).

In my mind, Fisher’s most enduring contribution is his devising a general way to ‘operationalize’ errors by embedding the material ex­periment into Mθ(x), and taming errors via probabilification, i.e. to define frequentist error probabilities in the context of a statistical model. These error probabilities are (a) deductively derived from the statistical model, and (b) provide a measure of the ‘effectiviness’ of the inference procedure: how often a certain method will give rise to correct in­ferences concerning the underlying ‘true’ Data Generating Mechanism (DGM). This cast aside the need for a prior. Both of these key elements, the statistical model and the error probabilities, have been refined and extended by Mayo’s error statistical approach (e.g., Mayo 1996). Learning from data is achieved when an inference is reached by an inductive procedure which, with high probability, will yield true conclusions from valid inductive premises (a statistical model); Mayo and Spanos (2011).

Frequentist statistical inference was largely in place by the late 1930s. Fisher, almost single-handedly, created the current theory of ‘optimal’ point estimation and formalized significance testing based on the p-value reasoning. In the early 1930s Neyman and Pearson (N-P) proposed an ‘optimal’ theory for hypothesis testing, by modify­ing/extending Fisher’s significance testing. By the late 1930s Neyman proposed an ‘optimal’ theory for interval estimation analogous to N-P testing. Despite these developments in frequentist statstics, its philo­sophical foundations concerned with the proper form of the underlying inductive reasoning were in a confused state. Fisher was arguing for ‘inductive inference’, spearheaded by his significance testing in conjunc­tion with p-values and his fiducial probability for interval estimation. Neyman was arguing for ‘inductive behavior’ based on N-P testing and confidence interval estimation firmly grounded on pre-data error prob­abilities.

The last exchange between these pioneers took place in the mid 1950s (see [Fisher, 1955; Neyman, 1956; Pearson, 1955]) and left the philosophical foundations of the field in a state of confusion with many more questions than answers.

One of the key issues of disagreement was about the relevance of alternative hypotheses and the role of the pre-data error probabilities in frequentist testing, i.e. the irrelevance of Errors of the “second kind”, as Fisher (p. 69) framed the issue. My take on this issue is that Fisher did understand the importance of alternative hypotheses and the power of the test by talking about its ‘sensitivity’:

“By increasing the size of the experiment, we can render it more sensi­tive, meaning by this that it will allow of the detection of a lower degree of sensory discrimination, or, in other words, of a quantitatively smaller departure from the null hypothesis.” (Fisher, 1935, p. 22)

If this is not the same as increasing the power of the test by increas­ing the sample size, I do not know what it is! What Fisher and many subsequent commentators did not appreciate enough was that Neyman and Pearson defined the relevant alternative hypotheses in a very spe­cific way: to be the complement to the null relative to the prespecified statistical model Mθ(x):

H0: µ∈Θ0 vs. H1: µ∈Θ1 (2)

where Θ0 and Θ1 constitute a partition of the parameter space Θ. That rendered the evaluation of power possible and Fisher’s comment about type II errors:

“Such errors are therefore incalculable both in frequency and in magni­tude merely from the specification of the null hypothesis.” simply misplaced.

Let me finish with a quotation from Fisher (1935) that I find very insightful and as relevant today as it was then:

“In the field of pure research no assessment of the cost of wrong con­clusions, or of delay in arriving at more correct conclusions can conceivably be more than a pretence, and in any case such an assessment would be inadmissible and irrelevant in judging the state of the scientific evidence.” (pp. 25-26)

This post was first blogged in 2012.


[1] Fisher, R. A. (1922), “On the mathematical foundations of theoret­ical statistics”, Philosophical Transactions of the Royal Society A,

222: 309-368.

[2] Fisher, R. A. (1935), The Design of Experiments, Oliver and Boyd, Edinburgh.

[3] Fisher, R. A. (1955), “Statistical methods and scientific induction,” Journal of the Royal Statistical Society, B, 17: 69-78.

[4] Mayo, D. G. and A. Spanos (2011), “Error Statistics,” pp. 151­196 in the Handbook of Philosophy of Science, vol. 7: Philosophy of Statistics, D. Gabbay, P. Thagard, and J. Woods (editors), Elsevier.

[5] Neyman, J. (1956), “Note on an Article by Sir Ronald Fisher,” Journal of the Royal Statistical Society, B, 18: 288-294.

[6] Pearson, E. S. (1955), “Statistical Concepts in the Relation to Real­ity,” Journal of the Royal Statistical Society, B, 17, 204-207.

[7] Spanos, A. (1999), Probability Theory and Statistical Inference: Econometric Modeling with Observational Data, Cambridge Uni­versity Press, Cambridge.

Categories: Fisher, Spanos, Statistics | Tags: , , , , , , | Leave a comment

Post navigation

I welcome constructive comments for 14-21 days. If you wish to have a comment of yours removed during that time, send me an e-mail.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at