Here are the slides from my discussion of Nancy Reid today at BFF4: The Fourth Bayesian, Fiducial, and Frequentist Workshop: May 1-3, 2017 (hosted by Harvard University)

# Fisher

## “Fusion-Confusion?” My Discussion of Nancy Reid: “BFF Four- Are we Converging?”

## Slides from the Boston Colloquium for Philosophy of Science: “Severe Testing: The Key to Error Correction”

Slides from my March 17 presentation on “Severe Testing: The Key to Error Correction” given at the Boston Colloquium for Philosophy of Science Alfred I.Taub forum on “Understanding Reproducibility and Error Correction in Science.”

## R.A Fisher: “It should never be true, though it is still often said, that the conclusions are no more accurate than the data on which they are based”

A final entry in a week of recognizing R.A.Fisher (February 17, 1890 – July 29, 1962). Fisher is among the very few thinkers I have come across to recognize this crucial difference between induction and deduction:

In deductive reasoning all knowledge obtainable is already latent in the postulates. Rigorous is needed to prevent the successive inferences growing less and less accurate as we proceed. The conclusions are never more accurate than the data. In inductive reasoning we are performing part of the process by which new knowledge is created. The conclusions normally grow more and more accurate as more data are included. It should never be true, though it is still often said, that the conclusions are no more accurate than the data on which they are based. Statistical data are always erroneous, in greater or less degree. The study of inductive reasoning is the study of the embryology of knowledge, of the processes by means of which truth is extracted from its native ore in which it is infused with much error. (Fisher, “The Logic of Inductive Inference,” 1935, p 54).

Reading/rereading this paper is very worthwhile for interested readers. Some of the fascinating historical/statistical background may be found in a guest post by Aris Spanos: “R.A.Fisher: How an Outsider Revolutionized Statistics”

## Guest Blog: STEPHEN SENN: ‘Fisher’s alternative to the alternative’

**As part of the week of recognizing R.A.Fisher (February 17, 1890 – July 29, 1962), I reblog a guest post by Stephen Senn from 2012. (I will comment in the comments.)**

*‘Fisher’s alternative to the alternative’*

*By: Stephen Senn*

[2012 marked] the 50th anniversary of RA Fisher’s death. It is a good excuse, I think, to draw attention to an aspect of his philosophy of significance testing. In his extremely interesting essay on Fisher, Jimmie Savage drew attention to a problem in Fisher’s approach to testing. In describing Fisher’s aversion to power functions Savage writes, ‘Fisher says that some tests are *more sensitive* than others, and I cannot help suspecting that that comes to very much the same thing as thinking about the power function.’ (Savage 1976) (P473).

The modern statistician, however, has an advantage here denied to Savage. Savage’s essay was published posthumously in 1976 and the lecture on which it was based was given in Detroit on 29 December 1971 (P441). At that time Fisher’s scientific correspondence did not form part of his available oeuvre but in 1990 Henry Bennett’s magnificent edition of Fisher’s statistical correspondence (Bennett 1990) was published and this throws light on many aspects of Fisher’s thought including on significance tests. Continue reading

## R.A. Fisher: “Statistical methods and Scientific Induction”

I continue a week of Fisherian posts in honor of his birthday (Feb 17). This is his contribution to the “Triad”–an exchange between Fisher, Neyman and Pearson 20 years after the Fisher-Neyman break-up. They are each very short.

*“Statistical Methods and Scientific Induction”*

*by Sir Ronald Fisher (1955)
*

**SUMMARY**

The attempt to reinterpret the common tests of significance used in scientific research as though they constituted some kind of acceptance procedure and led to “decisions” in Wald’s sense, originated in several misapprehensions and has led, apparently, to several more.

The three phrases examined here, with a view to elucidating they fallacies they embody, are:

- “Repeated sampling from the same population”,
- Errors of the “second kind”,
- “Inductive behavior”.

Mathematicians without personal contact with the Natural Sciences have often been misled by such phrases. The errors to which they lead are not only numerical.

To continue reading Fisher’s paper.

The most noteworthy feature is Fisher’s position on Fiducial inference, typically downplayed. I’m placing a summary and link to Neyman’s response below–it’s that interesting. Continue reading

## Guest Blog: ARIS SPANOS: The Enduring Legacy of R. A. Fisher

**By Aris Spanos**

One of R. A. Fisher’s (17 February 1890 — 29 July 1962) most remarkable, but least recognized, achievement was to initiate the recasting of statistical induction. Fisher (1922) pioneered modern frequentist statistics as a model-based approach to statistical induction anchored on the notion of a statistical model, formalized by:

M_{θ}(**x**)={f(**x**;θ); θ∈Θ**}**; **x**∈R^{n };Θ⊂R^{m}; m < n; (1)

where the distribution of the sample f(**x**;θ) ‘encapsulates’ the probabilistic information in the statistical model.

Before Fisher, the notion of a statistical model was vague and often implicit, and its role was primarily conﬁned to the description of the distributional features of the data in hand using the histogram and the ﬁrst few sample moments; implicitly imposing random (IID) samples. The problem was that statisticians at the time would use descriptive summaries of the data to claim generality beyond the data in hand **x**_{0}:=(x_{1},x_{2},…,x_{n}) As late as the 1920s, the problem of statistical induction was understood by Karl Pearson in terms of invoking (i) the ‘stability’ of empirical results for subsequent samples and (ii) a prior distribution for θ.

Fisher was able to recast statistical inference by turning Karl Pearson’s approach, proceeding from data **x**_{0 }in search of a frequency curve f(x;ϑ) to describe its histogram, on its head. He proposed to begin with a prespeciﬁed M_{θ}(**x**) (a ‘hypothetical inﬁnite population’), and view x_{0 }as a ‘typical’ realization thereof; see Spanos (1999). Continue reading

## R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’

*Today is R.A. Fisher’s birthday. I’ll post some different Fisherian items this week in honor of it. This paper comes just before the conflicts with Neyman and Pearson erupted. Fisher links his tests and sufficiency, to the Neyman and Pearson lemma in terms of power. It’s as if we may see them as ending up in a similar place while starting from different origins. I quote just the most relevant portions…the full article is linked below. Happy Birthday Fisher!*

“Two New Properties of Mathematical Likelihood“

by R.A. Fisher, F.R.S.

Proceedings of the Royal Society, Series A, 144: 285-307 (1934)

The property that where a sufficient statistic exists, the likelihood, apart from a factor independent of the parameter to be estimated, is a function only of the parameter and the sufficient statistic, explains the principle result obtained by Neyman and Pearson in discussing the efficacy of tests of significance. Neyman and Pearson introduce the notion that any chosen test of a hypothesis H_{0} is more powerful than any other equivalent test, with regard to an alternative hypothesis H_{1}, when it rejects H_{0} in a set of samples having an assigned aggregate frequency ε when H_{0} is true, and the greatest possible aggregate frequency when H_{1} is true. Continue reading

## Gigerenzer at the PSA: “How Fisher, Neyman-Pearson, & Bayes Were Transformed into the Null Ritual”: Comments and Queries (ii)

Gerd Gigerenzer, Andrew Gelman, Clark Glymour and I took part in a very interesting symposium on Philosophy of Statistics at the Philosophy of Science Association last Friday. I jotted down lots of notes, but I’ll limit myself to brief reflections and queries on a small portion of each presentation in turn, starting with Gigerenzer’s “Surrogate Science: How Fisher, Neyman-Pearson, & Bayes Were Transformed into the Null Ritual.” His complete slides are below my comments. I may write this in stages, this being (i).

SLIDE #19

- Good scientific practice–bold theories, double-blind experiments, minimizing measurement error, replication, etc.–became reduced in the social science to a surrogate: statistical significance.

I agree that “good scientific practice” isn’t some great big mystery, and that “bold theories, double-blind experiments, minimizing measurement error, replication, etc.” are central and interconnected keys to finding things out in error prone inquiry. *Do the social sciences really teach that inquiry can be reduced to cookbook statistics? Or is it simply that, in some fields, carrying out surrogate science suffices to be a “success”?* Continue reading

## Deconstructing the Fisher-Neyman conflict wearing fiducial glasses (continued)

This continues my previous post: “Can’t take the fiducial out of Fisher…” in recognition of Fisher’s birthday, February 17. I supply a few more intriguing articles you may find enlightening to read and/or reread on a Saturday night

Move up 20 years to the famous 1955/56 exchange between Fisher and Neyman. Fisher clearly connects Neyman’s adoption of a behavioristic-performance formulation to his denying the soundness of fiducial inference. When “Neyman denies the existence of inductive reasoning, he is merely expressing a verbal preference. For him ‘reasoning’ means what ‘deductive reasoning’ means to others.” (Fisher 1955, p. 74).

Fisher was right that Neyman’s calling the outputs of statistical inferences “actions” merely expressed Neyman’s preferred way of talking. Nothing earth-shaking turns on the choice to dub every inference “an act of making an inference”.[i] The “rationality” or “merit” goes into the rule. Neyman, much like Popper, had a good reason for drawing a bright red line between his use of probability (for corroboration or probativeness) and its use by ‘probabilists’ (who assign probability to hypotheses). Fisher’s Fiducial probability was in danger of blurring this very distinction. Popper said, and Neyman would have agreed, that he had no problem with our using the word induction so long it was kept clear it meant testing hypotheses severely. Continue reading

## Can’t Take the Fiducial Out of Fisher (if you want to understand the N-P performance philosophy) [i]

In recognition of R.A. Fisher’s birthday today, I’ve decided to share some thoughts on a topic that has so far has been absent from this blog: Fisher’s* fiducial probability*. **Happy Birthday Fisher.**

[Neyman and Pearson] “began an influential collaboration initially designed primarily, it would seem to clarify Fisher’s writing. This led to their theory of testing hypotheses and to Neyman’s development of confidence intervals, aiming to clarify Fisher’s idea of fiducial intervals (D.R.Cox, 2006, p. 195).

The entire episode of fiducial probability is fraught with minefields. Many say it was Fisher’s biggest blunder; others suggest it still hasn’t been understood. The majority of discussions omit the side trip to the Fiducial Forest altogether, finding the surrounding brambles too thorny to penetrate. Besides, a fascinating narrative about the Fisher-Neyman-Pearson divide has managed to bloom and grow while steering clear of fiducial probability–never mind that it remained a centerpiece of Fisher’s statistical philosophy. I now think that this is a mistake. It was thought, following Lehman (1993) and others, that we could take the fiducial out of Fisher and still understand the core of the Neyman-Pearson vs Fisher (or Neyman vs Fisher) disagreements. We can’t. Quite aside from the intrinsic interest in correcting the “he said/he said” of these statisticians, the issue is intimately bound up with the current (flawed) consensus view of frequentist error statistics.

So what’s *fiducial inference*? I follow Cox (2006), adapting for the case of the lower limit: Continue reading

## Sir Harold Jeffreys’ (tail area) one-liner: Sat night comedy [draft ii]

This headliner appeared two years ago, but to a sparse audience (likely because it was during winter break), so Management’s giving him another chance…

You might not have thought there could be new material for 2014, but there is, and if you look a bit more closely, you’ll see that it’s actually not Jay Leno [1] who is standing up there at the mike ….

It’s Sir Harold Jeffreys himself! And his (very famous) joke, I admit, is funny. So, since it’s Saturday night, let’s listen in on Sir Harold’s howler* in criticizing the use of p-values.

“Did you hear the one about significance testers rejectingH_{0}because of outcomesH_{0}didn’t predict?

‘What’s unusual about that?’ you ask?

What’s unusual, is that they do it when these unpredicted outcomes haven’t even occurred!”

Much laughter.

[The actual quote from Jeffreys: Using p-values implies that “An hypothesis that may be true is rejected because it has failed to predict observable results that have not occurred. This seems a remarkable procedure.” (Jeffreys 1939, 316)]

I say it’s funny, so to see why I’ll strive to give it a generous interpretation. Continue reading

## NEYMAN: “Note on an Article by Sir Ronald Fisher” (3 uses for power, Fisher’s fiducial argument)

**Note on an Article by Sir Ronald Fisher **

*By Jerzy Neyman (1956)
*

**Summary**

**(1) FISHER’S allegation that, contrary to some passages in the introduction and on the cover of the book by Wald, this book does not really deal with experimental design is unfounded. In actual fact, the book is permeated with problems of experimentation. (2) Without consideration of hypotheses alternative to the one under test and without the study of probabilities of the two kinds, no purely probabilistic theory of tests is possible. (3) The conceptual fallacy of the notion of fiducial distribution rests upon the lack of recognition that valid probability statements about random variables usually cease to be valid if the random variables are replaced by their particular values. The notorious multitude of “paradoxes” of fiducial theory is a consequence of this oversight. (4) The idea of a “cost function for faulty judgments” appears to be due to Laplace, followed by Gauss.**

*1. Introduction*

In a recent article (Fisher, 1955), Sir Ronald Fisher delivered an attack on a a substantial part of the research workers in mathematical statistics. My name is mentioned more frequently than any other and is accompanied by the more expressive invectives. Of the scientific questions raised by Fisher many were sufficiently discussed before (Neyman and Pearson, 1933; Neyman, 1937; Neyman, 1952). In the present note only the following points will be considered: (i) Fisher’s attack on the concept of errors of the second kind; (ii) Fisher’s reference to my objections to fiducial probability; (iii) Fisher’s reference to the origin of the concept of loss function and, before all, (iv) Fisher’s attack on Abraham Wald.

THIS SHORT (5 page) NOTE IS NEYMAN’S PORTION OF WHAT I CALL THE “TRIAD”. LET ME POINT YOU TO THE TOP HALF OF p. 291, AND THE DISCUSSION OF FIDUCIAL INFERENCE ON p. 292 HERE.

## Sir Harold Jeffreys’ (tail area) one-liner: Saturday night comedy (b)

This headliner appeared before, but to a sparse audience, so Management’s giving him another chance… His joke relates to both Senn’s post (about alternatives), and to my recent post about using (1 – β)/α as a likelihood ratio--but for very different reasons. (I’ve explained at the bottom of this “(b) draft”.)

** ….If you look closely, you’ll see that it’s actually not Jay Leno who is standing up there at the mike, (especially as he’s no longer doing the Tonight Show) …**.

**It’s Sir Harold Jeffreys himself! And his (very famous) joke, I admit, is funny. So, since it’s Saturday night, let’s listen in on Sir Harold’s howler joke* in criticizing the use of p-values.**

“Did you hear the one about significance testers rejectingH_{0}because of outcomesH_{0}didn’t predict?

‘What’s unusual about that?’ you ask?

What’s unusual is that they do it when these unpredicted outcomes haven’t even occurred!”

Much laughter.

[The actual quote from Jeffreys: Using p-values implies that “An hypothesis that may be true is rejected because it has failed to predict observable results that have not occurred. This seems a remarkable procedure.” (Jeffreys 1939, 316)]

I say it’s funny, so to see why I’ll strive to give it a generous interpretation. Continue reading

## Stephen Senn: Fisher’s Alternative to the Alternative

**As part of the week of recognizing R.A.Fisher (February 17, 1890 – July 29, 1962), I reblog Senn from 3 years ago. **

*‘Fisher’s alternative to the alternative’*

*By: Stephen Senn*

[2012 marked] the 50th anniversary of RA Fisher’s death. It is a good excuse, I think, to draw attention to an aspect of his philosophy of significance testing. In his extremely interesting essay on Fisher, Jimmie Savage drew attention to a problem in Fisher’s approach to testing. In describing Fisher’s aversion to power functions Savage writes, ‘Fisher says that some tests are *more sensitive* than others, and I cannot help suspecting that that comes to very much the same thing as thinking about the power function.’ (Savage 1976) (P473).

The modern statistician, however, has an advantage here denied to Savage. Savage’s essay was published posthumously in 1976 and the lecture on which it was based was given in Detroit on 29 December 1971 (P441). At that time Fisher’s scientific correspondence did not form part of his available oeuvre but in 1990 Henry Bennett’s magnificent edition of Fisher’s statistical correspondence (Bennett 1990) was published and this throws light on many aspects of Fisher’s thought including on significance tests.

The key letter here is Fisher’s reply of 6 October 1938 to Chester Bliss’s letter of 13 September. Bliss himself had reported an issue that had been raised with him by Snedecor on 6 September. Snedecor had pointed out that an analysis using inverse sine transformations of some data that Bliss had worked on gave a different result to an analysis of the original values. Bliss had defended his (transformed) analysis on the grounds that a) if a transformation always gave the same result as an analysis of the original data there would be no point and b) an analysis on inverse sines was a sort of weighted analysis of percentages with the transformation more appropriately reflecting the weight of information in each sample. Bliss wanted to know what Fisher thought of his reply.

Fisher replies with a ‘shorter catechism’ on transformations which ends as follows: Continue reading

## R. A. Fisher: How an Outsider Revolutionized Statistics (Aris Spanos)

In recognition of R.A. Fisher’s birthday….

**‘R. A. Fisher: How an Outsider Revolutionized Statistics’**

by **Aris Spanos**

Few statisticians will dispute that R. A. Fisher **(February 17, 1890 – July 29, 1962)** is the father of modern statistics; see Savage (1976), Rao (1992). Inspired by William Gosset’s (1908) paper on the Student’s t finite sampling distribution, he recast statistics into the modern model-based induction in a series of papers in the early 1920s. He put forward a theory of *optimal estimation* based on the method of maximum likelihood that has changed only marginally over the last century. His significance testing, spearheaded by the p-value, provided the basis for the Neyman-Pearson theory of *optimal testing* in the early 1930s. According to Hald (1998)

“Fisher was a genius who almost single-handedly created the foundations for modern statistical science, without detailed study of his predecessors. When young he was ignorant not only of the Continental contributions but even of contemporary publications in English.” (p. 738)

What is not so well known is that Fisher was the *ultimate outsider* when he brought about this change of paradigms in statistical science. As an undergraduate, he studied mathematics at Cambridge, and then did graduate work in statistical mechanics and quantum theory. His meager knowledge of statistics came from his study of astronomy; see Box (1978). That, however did not stop him from publishing his first paper in statistics in 1912 (still an undergraduate) on “curve fitting”, questioning Karl Pearson’s method of moments and proposing a new method that was eventually to become the likelihood method in his 1921 paper. Continue reading

## R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’: Just before breaking up (with N-P)

*In recognition of R.A. Fisher’s birthday tomorrow, I will post several entries on him. I find this (1934) paper to be intriguing –immediately before the conflicts with Neyman and Pearson erupted. It represents essentially the last time he could take their work at face value, without the professional animosities that almost entirely caused, rather than being caused by, the apparent philosophical disagreements and name-calling everyone focuses on. Fisher links his tests and sufficiency, to the Neyman and Pearson lemma in terms of power. It’s as if we may see them as ending up in a very similar place (no pun intended) while starting from different origins. I quote just the most relevant portions…the full article is linked below. I’d blogged it earlier here. You may find some gems in it.*

**‘Two new Properties of Mathematical Likelihood’**

by R.A. Fisher, F.R.S.

Proceedings of the Royal Society, Series A, 144: 285-307 (1934)

The property that where a sufficient statistic exists, the likelihood, apart from a factor independent of the parameter to be estimated, is a function only of the parameter and the sufficient statistic, explains the principle result obtained by Neyman and Pearson in discussing the efficacy of tests of significance. Neyman and Pearson introduce the notion that any chosen test of a hypothesis H_{0} is more powerful than any other equivalent test, with regard to an alternative hypothesis H_{1}, when it rejects H_{0} in a set of samples having an assigned aggregate frequency ε when H_{0} is true, and the greatest possible aggregate frequency when H_{1} is true.

If any group of samples can be found within the region of rejection whose probability of occurrence on the hypothesis H_{1} is less than that of any other group of samples outside the region, but is not less on the hypothesis H_{0}, then the test can evidently be made more powerful by substituting the one group for the other. Continue reading

## A. Spanos: “Recurring controversies about P values and conﬁdence intervals revisited”

**Aris Spanos**

Wilson E. Schmidt Professor of Economics

*Department of Economics, Virginia Tech*

**Recurring controversies about P values and conﬁdence intervals revisited*
**

*Ecological Society of America (ESA) ECOLOGY*

Forum—P Values and Model Selection (pp. 609-654)

Volume 95, Issue 3 (March 2014): pp. 645-651

*INTRODUCTION*

The use, abuse, interpretations and reinterpretations of the notion of a *P* value has been a hot topic of controversy since the 1950s in statistics and several applied ﬁelds, including psychology, sociology, ecology, medicine, and economics.

The initial controversy between Fisher’s signiﬁcance testing and the Neyman and Pearson (N-P; 1933) hypothesis testing concerned the extent to which the pre-data Type I error probability α can address the arbitrariness and potential abuse of Fisher’s *post-data threshold *for the *P *value. Continue reading

## Who ya gonna call for statistical Fraudbusting? R.A. Fisher, P-values, and error statistics (again)

*If there’s somethin’ strange in your neighborhood. Who ya gonna call?(Fisherian Fraudbusters!)**

*[adapted from R. Parker’s “Ghostbusters”]

When you need to warrant serious accusations of bad statistics, if not fraud, where do scientists turn? Answer: To the frequentist error statistical reasoning and to p-value scrutiny, first articulated by R.A. Fisher[i].The latest accusations of big time fraud in social psychology concern the case of Jens Förster. As Richard Gill notes:

## Sir Harold Jeffreys’ (tail area) one-liner: Sat night comedy [draft ii]

This headliner appeared last month, but to a sparse audience (likely because it was during winter break), so Management’s giving him another chance…

You might not have thought there could be new material for 2014, but there is, and if you look a bit more closely, you’ll see that it’s actually not Jay Leno who is standing up there at the mike ….

It’s Sir Harold Jeffreys himself! And his (very famous) joke, I admit, is funny. So, since it’s Saturday night, let’s listen in on Sir Harold’s howler* in criticizing the use of p-values.

“Did you hear the one about significance testers rejectingH_{0}because of outcomesH_{0}didn’t predict?

‘What’s unusual about that?’ you ask?

What’s unusual, is that they do it when these unpredicted outcomes haven’t even occurred!”

Much laughter.

[The actual quote from Jeffreys: Using p-values implies that “An hypothesis that may be true is rejected because it has failed to predict observable results that have not occurred. This seems a remarkable procedure.” (Jeffreys 1939, 316)]

I say it’s funny, so to see why I’ll strive to give it a generous interpretation. Continue reading

## STEPHEN SENN: Fisher’s alternative to the alternative

*By: Stephen Senn*

This year [2012] marks the 50th anniversary of RA Fisher’s death. It is a good excuse, I think, to draw attention to an aspect of his philosophy of significance testing. In his extremely interesting essay on Fisher, Jimmie Savage drew attention to a problem in Fisher’s approach to testing. In describing Fisher’s aversion to power functions Savage writes, ‘Fisher says that some tests are *more sensitive* than others, and I cannot help suspecting that that comes to very much the same thing as thinking about the power function.’ (Savage 1976) (P473).

The modern statistician, however, has an advantage here denied to Savage. Savage’s essay was published posthumously in 1976 and the lecture on which it was based was given in Detroit on 29 December 1971 (P441). At that time Fisher’s scientific correspondence did not form part of his available oeuvre but in1990 Henry Bennett’s magnificent edition of Fisher’s statistical correspondence (Bennett 1990) was published and this throws light on many aspects of Fisher’s thought including on significance tests.

The key letter here is Fisher’s reply of 6 October 1938 to Chester Bliss’s letter of 13 September. Bliss himself had reported an issue that had been raised with him by Snedecor on 6 September. Snedecor had pointed out that an analysis using inverse sine transformations of some data that Bliss had worked on gave a different result to an analysis of the original values. Bliss had defended his (transformed) analysis on the grounds that a) if a transformation always gave the same result as an analysis of the original data there would be no point and b) an analysis on inverse sines was a sort of weighted analysis of percentages with the transformation more appropriately reflecting the weight of information in each sample. Bliss wanted to know what Fisher thought of his reply.

Fisher replies with a ‘shorter catechism’ on transformations which ends as follows: Continue reading