In marking Egon Pearson’s birthday (Aug. 11), I’ll post some Pearson items this week. They will contain some new reflections on older Pearson posts on this blog. Today, I’m posting “Statistical Concepts in Their Relation to Reality” (Pearson 1955). I’ve linked to it several times over the years, but always find a new gem or two, despite its being so short. E. Pearson rejected some of the familiar tenets that have come to be associated with Neyman and Pearson (N-P) statistical tests, notably the idea that the essential justification for tests resides in a long-run control of rates of erroneous interpretations–what he termed the “behavioral” rationale of tests. In an unpublished letter E. Pearson wrote to Birnbaum (1974), he talks about N-P theory admitting of two interpretations: behavioral and evidential:
“I think you will pick up here and there in my own papers signs of evidentiality, and you can say now that we or I should have stated clearly the difference between the behavioral and evidential interpretations. Certainly we have suffered since in the way the people have concentrated (to an absurd extent often) on behavioral interpretations”.
(Nowadays, it might be said that some people concentrate to an absurd extent on “science-wise error rates” in their view of statistical tests as dichotomous screening devices.)
One of the best sources of E.S. Pearson’s statistical philosophy is his (1955) “Statistical Concepts in Their Relation to Reality”. It’s his response to Fisher (1955), the first part of what I call the “triad”). It begins like this:
Controversies in the field of mathematical statistics seem largely to have arisen because statisticians have been unable to agree upon how theory is to provide, in terms of probability statements, the numerical measures most helpful to those who have to draw conclusions from observational data. We are concerned here with the ways in which mathematical theory may be put, as it were, into gear with the common processes of rational thought, and there seems no reason to suppose that there is one best way in which this can be done. If, therefore, Sir Ronald Fisher recapitulates and enlarges on his views upon statistical methods and scientific induction we can all only be grateful, but when he takes this opportunity to criticize the work of others through misapprehension of their views as he has done in his recent contribution to this Journal (Fisher 1955 “Scientific Methods and Scientific Induction” ), it is impossible to leave him altogether unanswered.
In the first place it seems unfortunate that much of Fisher’s criticism of Neyman and Pearson’s approach to the testing of statistical hypotheses should be built upon a “penetrating observation” ascribed to Professor G.A. Barnard, the assumption involved in which happens to be historically incorrect. There was no question of a difference in point of view having “originated” when Neyman “reinterpreted” Fisher’s early work on tests of significance “in terms of that technological and commercial apparatus which is known as an acceptance procedure”. There was no sudden descent upon British soil of Russian ideas regarding the function of science in relation to technology and to five-year plans. It was really much simpler–or worse. The original heresy, as we shall see, was a Pearson one!…
You can read “Statistical Concepts in Their Relation to Reality” HERE.
What was the heresy, really? Pearson doesn’t mean it was he who endorsed the behavioristic model that Fisher is here attacking.[i] The “original heresy” refers to the break from Fisher in the explicit introduction of alternative hypotheses (even if only directional). Without considering alternatives, Pearson and Neyman argued, statistical tests of significance are insufficiently constrained–for evidential purposes! Note: this does not mean N-P tests give us merely a comparativist appraisal (as in a report of relative likelihoods!)
But it’s a mistake to suppose that’s all that an inferential or evidential formulation of statistical tests requires. What more is required comes out in my deconstruction of those famous (“miserable”) passages found in the key Neyman and Pearson 1933 paper. We acted out the play I wrote for SIST (2018) in our recent Summer Seminar in Phil Stat. The participants were surprisingly good actors!
Granted, these “evidential” attitudes and practices have never been explicitly codified to guide the interpretation of N-P tests. Doing so is my goal in viewing “Statistical Inference as Severe Testing”.
Notice, by the way, Pearson’s discussion and extension of Fisher’s construal of differences that are not statistically significant on p. 207:
These points might have been helpful to those especially concerned with mistaking non-statistically significant differences as supposed “proofs of the null”.
Share your comments.
“The triad”:
- Fisher, R. A. (1955), “Statistical Methods and Scientific Induction“. Journal of The Royal Statistical Society (B) 17: 69-78.
- Neyman, J. (1956), “Note on an Article by Sir Ronald Fisher,” Journal of the Royal Statistical Society. Series B (Methodological), 18: 288-294.
- Pearson, E. S. (1955), “Statistical Concepts in Their Relation to Reality,” Journal of the Royal Statistical Society, B, 17: 204-207.
I’ll post some other Pearson items over the week.
[i] Fisher’s tirades against behavioral interpretations of “his” tests are almost entirely a reflection of his break with Neyman (after 1935) rather than any radical disagreement either in philosophy or method. Fisher could be even more behavioristic in practice (if not in theory) than Neyman, and Neyman could be even more evidential in practice (if not in theory) than Fisher. Moreover, it was really when others discovered Fisher’s fiducial methods could fail to correspond to intervals with valid error probabilities that Fisher began claiming he never really was too wild about them! (Check fiducial on this blog and in Excursion 5 of SIST.) Contemporary writers tend to harp on the so-called “inconsistent hybrid” combining Fisherian and N-P tests. I argue in SIST that it’s time to dismiss these popular distractions: they are serious obstacles to progress in statistical understanding. Most notably, Fisherians are kept from adopting features of N-P statistics, and visa versa (or they adopt them improperly). What matters is what the methods are capable of doing! For more on this, see the post “it’s the methods, stupid!” and excerpts from Excursion 3 of SIST.
References
Lehmann, E. (1997). Review of Error and the Growth of Experimental Knowledge by Deborah G. Mayo, Journal of the American Statistical Association, Vol. 92.
Also of relevance:
Erich Lehmann’s (1993), “The Fisher, Neyman-Pearson Theories of Testing Hypotheses: One Theory or Two?“. Journal of the American Statistical Association, Vol. 88, No. 424: 1242-1249.
Mayo, D. (1996), “Why Pearson Rejected the Neyman-Pearson (Behavioristic) Philosophy and a Note on Objectivity in Statistics” (Chapter 11) in Error and the Growth of Experimental Knowledge. Chicago: University of Chicago Press. [This is a somewhat older view of mine.]
Mayo, D. (2018). Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars.CUP. (Sept. 1) [A much newer view of mine.] SIST 2018.