Phil6334/ Econ 6614

Can’t Take the Fiducial Out of Fisher (if you want to understand the N-P performance philosophy) [i]

imgres

R.A. Fisher: February 17, 1890 – July 29, 1962

Continuing with posts in recognition of R.A. Fisher’s birthday, I post one from a few years ago on a topic that had previously not been discussed on this blog: Fisher’s fiducial probability

[Neyman and Pearson] “began an influential collaboration initially designed primarily, it would seem to clarify Fisher’s writing. This led to their theory of testing hypotheses and to Neyman’s development of confidence intervals, aiming to clarify Fisher’s idea of fiducial intervals (D.R.Cox, 2006, p. 195).

The entire episode of fiducial probability is fraught with minefields. Many say it was Fisher’s biggest blunder; others suggest it still hasn’t been understood. The majority of discussions omit the side trip to the Fiducial Forest altogether, finding the surrounding brambles too thorny to penetrate. Besides, a fascinating narrative about the Fisher-Neyman-Pearson divide has managed to bloom and grow while steering clear of fiducial probability–never mind that it remained a centerpiece of Fisher’s statistical philosophy. I now think that this is a mistake. It was thought, following Lehmann (1993) and others, that we could take the fiducial out of Fisher and still understand the core of the Neyman-Pearson vs Fisher (or Neyman vs Fisher) disagreements. We can’t. Quite aside from the intrinsic interest in correcting the “he said/he said” of these statisticians, the issue is intimately bound up with the current (flawed) consensus view of frequentist error statistics.

So what’s fiducial inference? I follow Cox (2006), adapting for the case of the lower limit:

We take the simplest example,…the normal mean when the variance is known, but the considerations are fairly general. The lower limit, [with Z the standard Normal variate, and M the sample mean]:

M0 – zc σ/√n

derived from the probability statement

Pr(μ > M – zc σ/√n ) = 1 – c

is a particular instance of a hypothetical long run of statements a proportion 1 – c of which will be true, assuming the model is sound. We can, at least in principle, make such a statement for each c and thereby generate a collection of statements, sometimes called a confidence distribution. (Cox 2006, p. 66).

For Fisher it was a fiducial distribution. Once M0 is observed, M0 – zc σ/√n is what Fisher calls the fiducial c per cent limit for μ. Making such statements for different c’s yields his fiducial distribution.

In Fisher’s earliest paper on fiducial inference in 1930, he sets 1 – c as .95 per cent. Start from the significance test of μ (e.g., μ< μ0 vs. μ>μ0 ) with significance level .05. He defines the 95 percent value of the sample mean M, M.95 , such that in 95% of samples M< M.95 . In the Normal testing case, M.95 = μ0 + 1.65σ/√n. Notice M.95 is the cut-off for rejection in a .05 one-sided test T+ (of μμ0 vs. μ>μ0).

We have a relationship between the statistic [M] and the parameter μ such that M.95 = is the 95 per cent value corresponding to a given μ. This relationship implies the perfectly objective fact that in 5 per cent of samples M> M.95. (Fisher 1930, p. 533; I use μ for his θ, M in place of T).
That is, Pr(M < μ + 1.65σ/√n) = .95.

The event M > M.95 occurs just in case μ0  < M − 1.65σ/√n .[i]

For a particular observed M0 , M0 − 1.65σ/√n is the fiducial 5 per cent value of μ.

We may know as soon as M is calculated what is the fiducial 5 per cent value of μ, and that the true value of μ will be less than this value in just 5 per cent of trials. This then is a definite probability statement about the unknown parameter μ which is true irrespective of any assumption as to it’s a priori distribution. (Fisher 1930, p. 533 emphasis is mine).

This seductively suggests that μ μ.05 gets the probability .05! But we know we cannot say that Pr(μ μ.05) = .05.[ii]

However, Fisher’s claim that we obtain “a definite probability statement about the unknown parameter μ” can be interpreted in another way. There’s a kosher probabilistic statement about the pivot Z, it’s just not a probabilistic assignment to a parameter. Instead, a particular substitution is, to paraphrase Cox “a particular instance of a hypothetical long run of statements 95% of which will be true.” After all, Fisher was abundantly clear that the fiducial bound should not be regarded as an inverse inference to a posterior probability. We could only obtain an inverse inference, Fisher explains, by considering μ to have been selected from a superpopulation of μ‘s with known distribution. But then the inverse inference (posterior probability) would be a deductive inference and not properly inductive. Here, Fisher is quite clear, the move is inductive.

People are mistaken, Fisher says, when they try to find priors so that they would match the fiducial probability:

In reality the statements with which we are concerned differ materially in logical content from inverse probability statements, and it is to distinguish them from these that we speak of the distribution derived as a fiducial frequency distribution, and of the working limits, at any required level of significance, ….as the fiducial limits at this level. (Fisher 1936, p. 253).

So, what is being assigned the fiducial probability? It is, Fisher tells us, the “aggregate of all such statements…” Or, to put it another way, it’s the method of reaching claims to which the probability attaches. Because M and S (using the student’s T pivot) or M alone (where σ is assumed known) are sufficient statisticswe may infer, without any use of probabilities a priori, a frequency distribution for μ which shall correspond with the aggregate of all such statements … to the effect that the probability that μ is less than M – 1.65σ/√n is .05.” (Fisher 1936, p. 253)[iii]

Suppose you’re Neyman and Pearson aiming to clarify and justify Fisher’s methods.

”I see what’s going on’ we can imagine Neyman declaring. There’s a method for outputting statements such as would take the general form

μ >M – zcσ/√n

Some would be in error, others not. The method outputs statements with a probability of 1 – c of being correct. The outputs are instances of general form of statement, and the probability alludes to the relative frequencies that they would be correct, as given by the chosen significance or fiducial level c . Voila! “We may look at the purpose of tests from another viewpoint,” as Neyman and Pearson (1933) put it. Probability qualifies (and controls) the performance of a method.

There is leeway here for different interpretations and justifications of that probability, from actual to hypothetical performance, and from behavioristic to more evidential–I’m keen to develop the latter. But my main point here is that in struggling to extricate Fisher’s fiducial limits, without slipping into fallacy, they are led to the N-P performance construal. Is there an efficient way to test hypotheses based on probabilities? ask Neyman and Pearson in the opening of the 1933 paper.

Without hoping to know whether each separate hypothesis is true or false, we may search for rules to govern our behavior with regard to them, in following which we insure that, in the long run of experience, we shall not be too often wrong (Neyman and Pearson 1933, pp. 141-2/290-1).

At the time, Neyman thought his development of confidence intervals (in 1930) was essentially the same as Fisher’s fiducial intervals. Fisher’s talk of assigning fiducial probability to a parameter, Neyman thought at first, was merely the result of accidental slips of language, altogether expected in  explaining a new concept. There was evidence that Fisher accepted Neyman’s reading. When Neyman gave a paper in 1934 discussing confidence intervals, seeking to generalize fiducial limits, but making it clear that the term “confidence coefficient” is not synonymous to the term probability, Fisher didn’t object. In fact he bestowed high praise, saying Neyman “had every reason to be proud of the line of argument he had developed for its perfect clarity. The generalization was a wide and very handsome one,” the only problem being that there wasn’t a single unique confidence interval, as Fisher had wanted (for fiducial intervals).[iv] Slight hints of the two in a mutual admiration society are heard, with Fisher demurring that “Dr Neyman did him too much honor” in crediting him for the revolutionary insight of Student’s T pivot. Neyman responds that of course in calling it Student’s T he is crediting Student, but “this does not prevent me from recognizing and appreciating the work of Professor Fisher concerning the same distribution.”(Fisher comments on Neyman 1934, p. 137). For more on Neyman and Pearson being on Fisher’s side in these early years, see Spanos’s post.

So how does this relate to the current consensus view of Neyman-Pearson vs Fisher? Stay tuned.[v] In the mean time, share your views.

The next installment is here.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[i] (μ < M – zc σ/√n) iff M > M(1 – c) = M >μ + zc σ/√n

[ii] In terms of the pivot Z, the inequality Z >zc is equivalent to the inequality

μ < M –zc σ/√n

“so that this last inequality must be satisfied with the same probability as the first.” But the fiducial value replaces M with M0 and then Fisher’s assertion

Pr(μ > M0 –zc σ/√n ) = 1 – c

no longer holds. (Fallacy of probabilistic instantiation.) In this connection, see my previous post on confidence intervals in polling.

[iii] If we take a number of samples of size n from the same or from different populations, and for each calculate the fiducial 5 percent value for μ, then in 5 per cent of cases the true value of μ will be less than the value we have found. There is no contradiction in the fact that this may differ from a posterior probability. “The fiducial probability is more general and, I think, more useful in practice, for in practice our samples will all give different values, and therefore both different fiducial distributions and different inverse probability distributions. Whereas, however, the fiducial values are expected to be different in every case, and our probabilty statements are relative to such variability, the inverse probability statement is absolute in form and really means something different for each different sample, unless the observed statistic actually happens to be exactly the same.” (Fisher 1930, p. 535)

[iv]Fisher restricts fiducial distributions to special cases where the statistics exhaust the information. He recognizes”The political principle that ‘Anything can be proved with statistics’ if you don’t make use of all the information. This is essential for fiducial inference”. (1936, p. 255). There are other restrictions to the approach as he developed it; many have extended it. There are a number of contemporary movements to revive fiducial and confidence distributions. For references, see the discussants on my likelihood principle paper.

[v] For background, search Fisher on this blog. Some of the material here is from my forthcoming book, Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (CUP).

REFERENCES

Cox, D. R. (2006), Principles of Statistical Inference. Cambridge.

Fisher, R.A. (1930), “Inverse Probability,” Mathematical Proceedings of the Cambridge Philosophical Society, 26(4): 528-535.

Fisher, R.A. (1936), “Uncertain Inference,”Proceedings of the American Academy of Arts and Sciences 71: 248-258.

Lehmann, E. (1993), “The Fisher, Neyman-Pearson Theories of Testing Hypotheses: One Theory or Two?Journal of the American Statistical Association 88 (424): 1242–1249.

Neyman, J. (1934), “On the Two Different Aspects of the Representative Method: The Method of Stratified Sampling and the Method of Purposive Selection,” Early Statistical Papers of J. Neyman: 98-141. [Originally published (1934) in The Journal of the Royal Statistical Society 97(4): 558-625.]

This material is now part of Section 5.8 in Statistical Inference as Severe Testing: how to Get Beyond the Statistics Wars (Mayo 2018, CUP)

Categories: fiducial probability, Fisher, Phil6334/ Econ 6614, Statistics | Leave a comment

Guest Blog: R. A. Fisher: How an Outsider Revolutionized Statistics (Aris Spanos)

A SPANOS

.

In recognition of R.A. Fisher’s birthday on February 17…a week of Fisher posts!

‘R. A. Fisher: How an Outsider Revolutionized Statistics’

by Aris Spanos

Few statisticians will dispute that R. A. Fisher (February 17, 1890 – July 29, 1962) is the father of modern statistics; see Savage (1976), Rao (1992). Inspired by William Gosset’s (1908) paper on the Student’s t finite sampling distribution, he recast statistics into the modern model-based induction in a series of papers in the early 1920s. He put forward a theory of optimal estimation based on the method of maximum likelihood that has changed only marginally over the last century. His significance testing, spearheaded by the p-value, provided the basis for the Neyman-Pearson theory of optimal testing in the early 1930s. According to Hald (1998)

“Fisher was a genius who almost single-handedly created the foundations for modern statistical science, without detailed study of his predecessors. When young he was ignorant not only of the Continental contributions but even of contemporary publications in English.” (p. 738)

What is not so well known is that Fisher was the ultimate outsider when he brought about this change of paradigms in statistical science. As an undergraduate, he studied mathematics at Cambridge, and then did graduate work in statistical mechanics and quantum theory. His meager knowledge of statistics came from his study of astronomy; see Box (1978). That, however did not stop him from publishing his first paper in statistics in 1912 (still an undergraduate) on “curve fitting”, questioning Karl Pearson’s method of moments and proposing a new method that was eventually to become the likelihood method in his 1921 paper. Continue reading

Categories: Fisher, phil/history of stat, Phil6334/ Econ 6614, Spanos, Statistics | 2 Comments

R.A. Fisher: “Statistical methods and Scientific Induction”

I continue a week of Fisherian posts begun on his birthday (Feb 17). This is his contribution to the “Triad”–an exchange between  Fisher, Neyman and Pearson 20 years after the Fisher-Neyman break-up. The other two are below. They are each very short and are worth your rereading.

17 February 1890 — 29 July 1962

“Statistical Methods and Scientific Induction”

by Sir Ronald Fisher (1955)

SUMMARY

The attempt to reinterpret the common tests of significance used in scientific research as though they constituted some kind of  acceptance procedure and led to “decisions” in Wald’s sense, originated in several misapprehensions and has led, apparently, to several more.

The three phrases examined here, with a view to elucidating they fallacies they embody, are:

  1. “Repeated sampling from the same population”,
  2. Errors of the “second kind”,
  3. “Inductive behavior”.

Mathematicians without personal contact with the Natural Sciences have often been misled by such phrases. The errors to which they lead are not only numerical.

To continue reading Fisher’s paper.

 

Note on an Article by Sir Ronald Fisher

by Jerzy Neyman (1956)

Neyman

Neyman

Summary

(1) FISHER’S allegation that, contrary to some passages in the introduction and on the cover of the book by Wald, this book does not really deal with experimental design is unfounded. In actual fact, the book is permeated with problems of experimentation.  (2) Without consideration of hypotheses alternative to the one under test and without the study of probabilities of the two kinds, no purely probabilistic theory of tests is possible.  (3) The conceptual fallacy of the notion of fiducial distribution rests upon the lack of recognition that valid probability statements about random variables usually cease to be valid if the random variables are replaced by their particular values.  The notorious multitude of “paradoxes” of fiducial theory is a consequence of this oversight.  (4)  The idea of a “cost function for faulty judgments” appears to be due to Laplace, followed by Gauss.

 

E.S. Pearson

“Statistical Concepts in Their Relation to Reality”.

by E.S. Pearson (1955)

Controversies in the field of mathematical statistics seem largely to have arisen because statisticians have been unable to agree upon how theory is to provide, in terms of probability statements, the numerical measures most helpful to those who have to draw conclusions from observational data.  We are concerned here with the ways in which mathematical theory may be put, as it were, into gear with the common processes of rational thought, and there seems no reason to suppose that there is one best way in which this can be done.  If, therefore, Sir Ronald Fisher recapitulates and enlarges on his views upon statistical methods and scientific induction we can all only be grateful, but when he takes this opportunity to criticize the work of others through misapprehension of their views as he has done in his recent contribution to this Journal (Fisher 1955 “Scientific Methods and Scientific Induction” ), it is impossible to leave him altogether unanswered.

In the first place it seems unfortunate that much of Fisher’s criticism of Neyman and Pearson’s approach to the testing of statistical hypotheses should be built upon a “penetrating observation” ascribed to Professor G.A. Barnard, the assumption involved in which happens to be historically incorrect.  There was no question of a difference in point of view having “originated” when Neyman “reinterpreted” Fisher’s early work on tests of significance “in terms of that technological and commercial apparatus which is known as an acceptance procedure”. There was no sudden descent upon British soil of Russian ideas regarding the function of science in relation to technology and to five-year plans.  It was really much simpler–or worse.  The original heresy, as we shall see, was a Pearson one!…

To continue reading, “Statistical Concepts in Their Relation to Reality” click HERE

Categories: E.S. Pearson, fiducial probability, Fisher, Neyman, phil/history of stat, Phil6334/ Econ 6614 | 1 Comment

Happy Birthday R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’

17 February 1890–29 July 1962

Today is R.A. Fisher’s birthday. I will post some Fisherian items this week in recognition of it*. This paper comes just before the conflicts with Neyman and Pearson erupted.  Fisher links his tests and sufficiency, to the Neyman and Pearson lemma in terms of power.  We may see them as ending up in a similar place while starting from different origins. I quote just the most relevant portions…the full article is linked below. Happy Birthday Fisher!

Two New Properties of Mathematical Likelihood

by R.A. Fisher, F.R.S.

Proceedings of the Royal Society, Series A, 144: 285-307 (1934)

  The property that where a sufficient statistic exists, the likelihood, apart from a factor independent of the parameter to be estimated, is a function only of the parameter and the sufficient statistic, explains the principle result obtained by Neyman and Pearson in discussing the efficacy of tests of significance.  Neyman and Pearson introduce the notion that any chosen test of a hypothesis H0 is more powerful than any other equivalent test, with regard to an alternative hypothesis H1, when it rejects H0 in a set of samples having an assigned aggregate frequency ε when H0 is true, and the greatest possible aggregate frequency when H1 is true. If any group of samples can be found within the region of rejection whose probability of occurrence on the hypothesis H1 is less than that of any other group of samples outside the region, but is not less on the hypothesis H0, then the test can evidently be made more powerful by substituting the one group for the other.

Consequently, for the most powerful test possible the ratio of the probabilities of occurrence on the hypothesis H0 to that on the hypothesis H1 is less in all samples in the region of rejection than in any sample outside it. For samples involving continuous variation the region of rejection will be bounded by contours for which this ratio is constant. The regions of rejection will then be required in which the likelihood of H0 bears to the likelihood of H1, a ratio less than some fixed value defining the contour. (295)…

It is evident, at once, that such a system is only possible when the class of hypotheses considered involves only a single parameter θ, or, what come to the same thing, when all the parameters entering into the specification of the population are definite functions of one of their number.  In this case, the regions defined by the uniformly most powerful test of significance are those defined by the estimate of maximum likelihood, T.  For the test to be uniformly most powerful, moreover, these regions must be independent of θ showing that the statistic must be of the special type distinguished as sufficient.  Such sufficient statistics have been shown to contain all the information which the sample provides relevant to the value of the appropriate parameter θ . It is inevitable therefore that if such a statistic exists it should uniquely define the contours best suited to discriminate among hypotheses differing only in respect of this parameter; and it is surprising that Neyman and Pearson should lay it down as a preliminary consideration that ‘the testng of statistical hypotheses cannot be treated as a problem in estimation.’ When tests are considered only in relation to sets of hypotheses specified by one or more variable parameters, the efficacy of the tests can be treated directly as the problem of estimation of these parameters.  Regard for what has been established in that theory, apart from the light it throws on the results already obtained by their own interesting line of approach, should also aid in treating the difficulties inherent in cases in which no sufficient statistics exists. (296)

*I’ve posted several of these items, in different forms, during the years of writing Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP): this is the first year I can point to the discussions of Fisher therein. The current post emerges in Excursion 5 Tour III. However, I still think it’s crucial to read and reread the original articles!

Categories: Fisher, phil/history of stat, Phil6334/ Econ 6614, Statistics | Tags: , , , | Leave a comment

Mayo Slides Meeting #1 (Phil 6334/Econ 6614, Mayo & Spanos)

Slides  Meeting #1 (Phil 6334/Econ 6614: Current Debates on Statistical Inference and Modeling (D. Mayo and A. Spanos)

 

Categories: Phil6334/ Econ 6614 | Leave a comment

R.A. Fisher: “Statistical methods and Scientific Induction”

I continue a week of Fisherian posts begun on his birthday (Feb 17). This is his contribution to the “Triad”–an exchange between  Fisher, Neyman and Pearson 20 years after the Fisher-Neyman break-up. The other two are below. They are each very short and are worth your rereading.

17 February 1890 — 29 July 1962

“Statistical Methods and Scientific Induction”

by Sir Ronald Fisher (1955)

SUMMARY

The attempt to reinterpret the common tests of significance used in scientific research as though they constituted some kind of  acceptance procedure and led to “decisions” in Wald’s sense, originated in several misapprehensions and has led, apparently, to several more.

The three phrases examined here, with a view to elucidating they fallacies they embody, are:

  1. “Repeated sampling from the same population”,
  2. Errors of the “second kind”,
  3. “Inductive behavior”.

Mathematicians without personal contact with the Natural Sciences have often been misled by such phrases. The errors to which they lead are not only numerical.

To continue reading Fisher’s paper.

 

Note on an Article by Sir Ronald Fisher

by Jerzy Neyman (1956)

Neyman

Neyman

Summary

(1) FISHER’S allegation that, contrary to some passages in the introduction and on the cover of the book by Wald, this book does not really deal with experimental design is unfounded. In actual fact, the book is permeated with problems of experimentation.  (2) Without consideration of hypotheses alternative to the one under test and without the study of probabilities of the two kinds, no purely probabilistic theory of tests is possible.  (3) The conceptual fallacy of the notion of fiducial distribution rests upon the lack of recognition that valid probability statements about random variables usually cease to be valid if the random variables are replaced by their particular values.  The notorious multitude of “paradoxes” of fiducial theory is a consequence of this oversight.  (4)  The idea of a “cost function for faulty judgments” appears to be due to Laplace, followed by Gauss.

 

E.S. Pearson

“Statistical Concepts in Their Relation to Reality”.

by E.S. Pearson (1955)

Controversies in the field of mathematical statistics seem largely to have arisen because statisticians have been unable to agree upon how theory is to provide, in terms of probability statements, the numerical measures most helpful to those who have to draw conclusions from observational data.  We are concerned here with the ways in which mathematical theory may be put, as it were, into gear with the common processes of rational thought, and there seems no reason to suppose that there is one best way in which this can be done.  If, therefore, Sir Ronald Fisher recapitulates and enlarges on his views upon statistical methods and scientific induction we can all only be grateful, but when he takes this opportunity to criticize the work of others through misapprehension of their views as he has done in his recent contribution to this Journal (Fisher 1955 “Scientific Methods and Scientific Induction” ), it is impossible to leave him altogether unanswered.

In the first place it seems unfortunate that much of Fisher’s criticism of Neyman and Pearson’s approach to the testing of statistical hypotheses should be built upon a “penetrating observation” ascribed to Professor G.A. Barnard, the assumption involved in which happens to be historically incorrect.  There was no question of a difference in point of view having “originated” when Neyman “reinterpreted” Fisher’s early work on tests of significance “in terms of that technological and commercial apparatus which is known as an acceptance procedure”. There was no sudden descent upon British soil of Russian ideas regarding the function of science in relation to technology and to five-year plans.  It was really much simpler–or worse.  The original heresy, as we shall see, was a Pearson one!…

To continue reading, “Statistical Concepts in Their Relation to Reality” click HERE

Categories: E.S. Pearson, fiducial probability, Fisher, Neyman, phil/history of stat, Phil6334/ Econ 6614 | 3 Comments

Blog at WordPress.com.