Today is Egon Pearson’s birthday. In honor of his birthday, I am posting “Statistical Concepts in Their Relation to Reality” (Pearson 1955). I’ve posted it several times over the years, but always find a new gem or two, despite its being so short. E. Pearson rejected some of the familiar tenets that have come to be associated with Neyman and Pearson (N-P) statistical tests, notably the idea that the essential justification for tests resides in a long-run control of rates of erroneous interpretations–what he termed the “behavioral” rationale of tests. In an unpublished letter E. Pearson wrote to Birnbaum (1974), he talks about N-P theory admitting of two interpretations: behavioral and evidential:
“I think you will pick up here and there in my own papers signs of evidentiality, and you can say now that we or I should have stated clearly the difference between the behavioral and evidential interpretations. Certainly we have suffered since in the way the people have concentrated (to an absurd extent often) on behavioral interpretations”.
(Nowadays, some people concentrate to an absurd extent on “science-wise error rates in dichotomous screening”.)
When Erich Lehmann, in his review of my “Error and the Growth of Experimental Knowledge” (EGEK 1996), called Pearson “the hero of Mayo’s story,” it was because I found in E.S.P.’s work, if only in brief discussions, hints, and examples, the key elements for an “inferential” or “evidential” interpretation of N-P statistics. Granted, these “evidential” attitudes and practices have never been explicitly codified to guide the interpretation of N-P tests. Doubtless, “Pearson and Pearson” statistics (both Egon, not Karl) would have looked very different from Neyman and Pearson statistics, I suspect. One of the best sources of E.S. Pearson’s statistical philosophy is his (1955) “Statistical Concepts in Their Relation to Reality”. It begins like this:
Controversies in the field of mathematical statistics seem largely to have arisen because statisticians have been unable to agree upon how theory is to provide, in terms of probability statements, the numerical measures most helpful to those who have to draw conclusions from observational data. We are concerned here with the ways in which mathematical theory may be put, as it were, into gear with the common processes of rational thought, and there seems no reason to suppose that there is one best way in which this can be done. If, therefore, Sir Ronald Fisher recapitulates and enlarges on his views upon statistical methods and scientific induction we can all only be grateful, but when he takes this opportunity to criticize the work of others through misapprehension of their views as he has done in his recent contribution to this Journal (Fisher 1955 “Scientific Methods and Scientific Induction” ), it is impossible to leave him altogether unanswered.
In the first place it seems unfortunate that much of Fisher’s criticism of Neyman and Pearson’s approach to the testing of statistical hypotheses should be built upon a “penetrating observation” ascribed to Professor G.A. Barnard, the assumption involved in which happens to be historically incorrect. There was no question of a difference in point of view having “originated” when Neyman “reinterpreted” Fisher’s early work on tests of significance “in terms of that technological and commercial apparatus which is known as an acceptance procedure”. There was no sudden descent upon British soil of Russian ideas regarding the function of science in relation to technology and to five-year plans. It was really much simpler–or worse. The original heresy, as we shall see, was a Pearson one!…
To continue reading, “Statistical Concepts in Their Relation to Reality” click HERE
Pearson doesn’t mean it was he who endorsed the behavioristic model that Fisher is here attacking.[i] The “original heresy” refers to the break from Fisher in the explicit introduction of alternative hypotheses (even if only directional). Without considering alternatives, Pearson and Neyman argued, significance tests are insufficiently constrained–for evidential purposes! However, this does not mean N-P tests give us merely a comparativist appraisal (as in a report of relative likelihoods!)
This is a good weekend to read or reread “the triad”:
- Fisher, R. A. (1955), “Statistical Methods and Scientific Induction“. Journal of The Royal Statistical Society (B) 17: 69-78.
- Neyman, J. (1956), “Note on an Article by Sir Ronald Fisher,” Journal of the Royal Statistical Society. Series B (Methodological), 18: 288-294.
- Pearson, E. S. (1955), “Statistical Concepts in Their Relation to Reality,” Journal of the Royal Statistical Society, B, 17: 204-207.
I’ll post some other Pearson items over the week.
HAPPY BIRTHDAY E. PEARSON
[i] Fisher’s tirades against behavioral interpretations of “his” tests are almost entirely a reflection of his break with Neyman (after 1935) rather than any radical disagreement either in philosophy or method. Fisher could be even more behavioristic in practice (if not in theory) than Neyman, and Neyman could be even more evidential in practice (if not in theory) than Fisher. Moreover, it was really when others discovered Fisher’s fiducial methods could fail to correspond to intervals with valid error probabilities that Fisher began claiming he never really was too wild about them! (Check fiducial on this blog.) Contemporary writers love to harp on the so-called “inconsistent hybrid” combining Fisherian and N-P tests, but it’s largely a lot of hoopla growing out of either their taking Fisher-Neyman personality feuds at face value or (more likely) imposing their own philosophies of statistics on the historical exchanges. It’s time to dismiss these popular distractions: they are serious obstacles to progress in statistical understanding. Most notably, Fisherians are kept from adopting features of N-P statistics, and visa versa (or they adopt them improperly). What matters is what the methods are capable of doing! For more on this, see “it’s the methods, stupid!”
Lehmann, E. (1997). Review of Error and the Growth of Experimental Knowledge by Deborah G. Mayo, Journal of the American Statistical Association, Vol. 92.
Also of relevance:
Erich Lehmann’s (1993), “The Fisher, Neyman-Pearson Theories of Testing Hypotheses: One Theory or Two?“. Journal of the American Statistical Association, Vol. 88, No. 424: 1242-1249.
Mayo, D. (1996), “Why Pearson Rejected the Neyman-Pearson (Behavioristic) Philosophy and a Note on Objectivity in Statistics” (Chapter 11) in Error and the Growth of Experimental Knowledge. Chicago: University of Chicago Press. [This is a somewhat older view of mine.]
Mayo, D. (2018). Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars.CUP. (Sept. 1) [A much newer view of mine.]