Egon Pearson’s Heresy

E.S. Pearson: 11 Aug 1895-12 June 1980.

Today is Egon Pearson’s birthday: 11 August 1895-12 June, 1980.
E. Pearson rejected some of the familiar tenets that have come to be associated with Neyman and Pearson (N-P) statistical tests, notably the idea that the essential justification for tests resides in a long-run control of rates of erroneous interpretations–what he termed the “behavioral” rationale of tests. In an unpublished letter E. Pearson wrote to Birnbaum (1974), he talks about N-P theory admitting of two interpretations: behavioral and evidential:

“I think you will pick up here and there in my own papers signs of evidentiality, and you can say now that we or I should have stated clearly the difference between the behavioral and evidential interpretations. Certainly we have suffered since in the way the people have concentrated (to an absurd extent often) on behavioral interpretations”.

(Nowadays, some people concentrate to an absurd extent on “science-wise error rates in dichotomous screening”.)

When Erich Lehmann, in his review of my “Error and the Growth of Experimental Knowledge” (EGEK 1996), called Pearson “the hero of Mayo’s story,” it was because I found in E.S.P.’s work, if only in brief discussions, hints, and examples, the key elements for an “inferential” or “evidential” interpretation of N-P statistics. Granted, these “evidential” attitudes and practices have never been explicitly codified to guide the interpretation of N-P tests. If they had been, I would not be on about providing an inferential philosophy all these years.[i] Nevertheless, “Pearson and Pearson” statistics (both Egon, not Karl) would have looked very different from Neyman and Pearson statistics, I suspect. One of the few sources of E.S. Pearson’s statistical philosophy is his (1955) “Statistical Concepts in Their Relation to Reality”. It begins like this:

Controversies in the field of mathematical statistics seem largely to have arisen because statisticians have been unable to agree upon how theory is to provide, in terms of probability statements, the numerical measures most helpful to those who have to draw conclusions from observational data.  We are concerned here with the ways in which mathematical theory may be put, as it were, into gear with the common processes of rational thought, and there seems no reason to suppose that there is one best way in which this can be done.  If, therefore, Sir Ronald Fisher recapitulates and enlarges on his views upon statistical methods and scientific induction we can all only be grateful, but when he takes this opportunity to criticize the work of others through misapprehension of their views as he has done in his recent contribution to this Journal (Fisher 1955 “Scientific Methods and Scientific Induction” ), it is impossible to leave him altogether unanswered.

In the first place it seems unfortunate that much of Fisher’s criticism of Neyman and Pearson’s approach to the testing of statistical hypotheses should be built upon a “penetrating observation” ascribed to Professor G.A. Barnard, the assumption involved in which happens to be historically incorrect.  There was no question of a difference in point of view having “originated” when Neyman “reinterpreted” Fisher’s early work on tests of significance “in terms of that technological and commercial apparatus which is known as an acceptance procedure”.  There was no sudden descent upon British soil of Russian ideas regarding the function of science in relation to technology and to five-year plans.  It was really much simpler–or worse.  The original heresy, as we shall see, was a Pearson one!…

To continue reading, “Statistical Concepts in Their Relation to Reality” click HERE [iii]

Pearson doesn’t mean it was he who endorsed the behavioristic model that Fisher is here attacking.[ii] The “original heresy” refers to the break from Fisher in the explicit introduction of alternative hypotheses (even if only directional). Without considering alternatives, Pearson and Neyman argued, significance tests are insufficiently constrained–for evidential purposes! However, this does not mean N-P tests give us merely a comparativist appraisal (as in a report of relative likelihoods!)

Happy Birthday E.S. Pearson!


[i] Noteworthy leaders in this “evidential interpretation” are David Cox and Allan Birnbaum.

[ii] Fisher’s tirades against behavioral interpretations of “his” tests are almost entirely a reflection of his break with Neyman (after 1935) rather than any radical disagreement either in philosophy or method. Fisher could be even more behavioristic in practice (if not in theory) than Neyman, and Neyman could be even more evidential in practice (if not in theory) than Fisher. Contemporary writers love to harp on the so-called “inconsistent hybrid” combining Fisherian and N-P tests, but it’s largely a lot of hoopla growing out of their taking Fisher-Neyman personality feuds at face value. It’s time to dismiss these popular distractions: they are an obstacle to progress in statistical understanding. The only thing that matters is what the methods are capable of doing!  For more on this, see “it’s the methods, stupid!”

[iii]See also Aris Spanos: “Egon Pearson’s Neglected Contributions to Statistics“, and my “E.S. Pearson’s statistical philosophy” from 2 years ago.


The “Triad” (3 short, key, papers):

Also of relevance:

Erich Lehmann’s (1993), “The Fisher, Neyman-Pearson Theories of Testing Hypotheses: One Theory or Two?“. Journal of the American Statistical Association, Vol. 88, No. 424: 1242-1249.

Mayo, D. (1996), “Why Pearson Rejected the Neyman-Pearson (Behavioristic) Philosophy and a Note on Objectivity in Statistics” (Chapter 11) in Error and the Growth of Experimental Knowledge. Chicago: University of Chicago Press.



Categories: phil/history of stat, Philosophy of Statistics, Statistics | Tags: ,

Post navigation

2 thoughts on “Egon Pearson’s Heresy

  1. I agree that the animosity that Fisher showed to Neyman was both unfortunate and an overreaction. The origin, in my opinion, had nothing to do with hypothesis testing versus significance testing but to do with Neyman’s remarks on Latin Squares. Here I think that Neyman was technically in the wrong and could also be faulted in the wording of his criticism.

    In ‘Added Values’ I described the incident thus

    “Fisher replied at great length in remarks that were extremely critical of Neyman himself. He maintained that the analysis of variance applied to Latin squares was valid. He also made a reference, that at the time must have meant nothing to most who were present, stating:

    ‘…it was only about a year since another academic mathematician from abroad had been as much excited about having proved that the Latin Square was mathematically exact, as Dr Neyman seemed to be at having proved it inaccurate.’

    It now seems likely that the ‘academic mathematician’ concerned was Samuel Wilks (1906–1964), who in 1933 submitted a paper to the Royal Society, in which, using characteristic functions, he proved the validity of Fisher’s z test. ”

    I think that Fisher suffered from the fact that although he was an excellent mathematician (as Savage later admitted he had failed to realise) he did not have much use for formal mathematics. Those who loved formalism thus tended to underestimate him. I think Neyman was guilty of that here.

    However, I don’t agree entirely that there is little of substance between the two philosophies of testing. I think that the Fisher-Behrens controversy show that there were, indeed, differences of theory that were important even if the practical differences were less so.

    Anyway, poor old Egon Pearson, a man admitted by all to be a gentleman, suffered by double association. First as the son of Karl and second as the collaborator of Jerzy.

  2. Stephen: I qgree with all this. I might have noted the event that started it all, dicussed several times on this blog:
    I concur as well about there being some degree of substantive philosophical disagreement, but it grew (and grew!) right along side the professional hostilities.The main thing is that these issues, at a time when the methods are in early development, scarcely make for an inconsistent hybrid. Viewing it as such has now become a common prelude to a misunderstanding of the methods altogether, e.g., the accusation that p-values aren’t error probabilities, followed by p-values overstate evidence followed by a Bayesian recommendation, be they J. Berger’s “conditional error probabilities” (which are posteriors) or another spiked prior variation.
    I say, “it’s the methods, stupid”. Never mind what these men fought about.

Blog at