I was pointing to the math error – it does not necessarily mean Fisher was right about everything but it does mean Neyman was in error about something. (I thought error statisticians wanted to know about errors rather than not know.)

It does however suggest that Fisher had some justification for being cranky – he pointed to where the error was and that should have been enough for a good mathematician to realize they should verify their derivation to ensure that there wasn’t an error.

Keith O’Rourke

]]>Similarly, Senn’s comment, linked from Gelman’s discussion, says how “weird” Neyman’s null is in one setting, but it’s hardly a conclusive argument that one should never consider it.

]]>‘In applications, there is usually available a nested family of rejection regions corresponding to different significance levels. It is then good practice to determine not only whether the hypothesis is accepted or rejected at the given significance level, but also to determine the smallest significance level alpha_^ =alpha_ˆ(x), the significance probability or p-value, at which the hypothesis would be rejected for the given observation’

It seems to me that the differences between Fisher and Neyman are more to do with the following:

A) Is the alternative more primitive than the test statistic?

B) Should the property of the tests be judged by relative frequencies over all repetitions of the experiment as performed or should one condition on relevant subsets?

C) Can it be the case that in certain complex cases the program of controlling the type I error rate at a fixed level while maximising power is inappropriate?

I think that it is certainly true that in the Latin square case Neyman and Fisher’s hypothesis are different. I was unaware that it was also the case that Neyman’s algebra was wrong.

(However, it is interesting t note that in the discussion Fisher refers to a mathematician, who must have been Wilks, having proved that his method was correct.)

1. Senn S. A comment on replication, p-values and evidence, S.N.Goodman, Statistics in Medicine 1992; 11:875-879. Statistics in Medicine 2002; 21: 2437-2444; author reply 2445-2437.

2 Lehmann EL. Testing Statistical Hypotheses. Chapman and Hall: New York, 1994.

“Neyman in fact made a crucial algebraic mistake

in his appendix, and his expressions for the expected

mean residual sum of squares for both designs are

generally incorrect. We present the correct expressions

in Sections”

Keith O’Rourke

]]>http://errorstatistics.com/2013/05/24/gelman-sides-w-neyman-over-fisher-in-relation-to-a-famous-blow-up/ ]]>

I recently found out that Neyman made a math error in that particular argument that Stephen refers to.

Fisher did claim there was an error but I guess did not think he needed to walk people (who thought they were better mathematicians than him) through it.

https://projecteuclid.org/euclid.ss/1408368581

Keith O’Rourke

]]>http://errorstatistics.com/2014/08/17/are-p-values-error-probabilities-installment-1/

and other posts like it, you’ll remember that it was Fisher who was using tests as automatic rules to decide whether to pay any more attention to an effect or not. The justification for such an automatic rule was 5% error in the long run. N-P formulated tests to try and justify what Fisher was doing but without some of the latitude for bad tests. The remark you cite is 20 years after the break-up (see anger management post) when Fisher took off his behavioristic hat etc. etc. He was livid that people everywhere were using N-P statistics more than his tests. Barnard gave Fisher this basis for beating up on Neyman (Barnard told me this personally, but it’s also in Fisher’s writing in the “triad”)

As for that passage, it’s incorrectly understood if viewed as an assertion about long-runs AS OPPOSED to an assertion—one that is quite early, in the 33 paper–about the NEED FOR AN ALTERNATIVE. I analyze that passage over 10 pages in my book.

I agree there’s a whiff more of behaviorism in Neyman than Fisher but just barely and only later on. They both used tests inferentially and behavioristically. And in practice Neyman is more “evidential” than Fisher. Pearson was somewhere in the middle, but clearly hinted at an evidential twist to the error probabilities. Birnbaum gave another hint. I propose a full-fledged evidential construal.

Fisher wrote: “The appreciation of such more complicated cases will be much aided by a clear view of the nature of a test of significance applied to a single hypothesis by a unique body of observations.” (1956 p. 46), but ion marked contrast, Neyman & Pearson wrote that their framework was the reasult of having decided that “no test based on the theory of probability can by itself provide any valuable evidence of the truth or falsehood of [a particular] hypothesis.” and “Without hoping to know whether each separate hypothesis is true or false, we may search for rules to govern our behaviour…” (1933, pp. 290-291). Neyman’s focus is on global probabilities and Fisher’s was on local probabilities, and where the local probabilities were unavailable via his fiducial argument, he suggested that we look at the likelihoods.

To the first approximation a hypothesis test yields a decision regarding the null hypothesis that preserves global error probabilities and a significance test yields a P-value that has a claim to being related to local evidence.

I agree with your position that it doesn’t much matter what either Neyman or Fisher wrote and did, we should worry about the properties of the methods. However, it is important that we do not lose sight of the distinction that should be drawn between long-run error rates and evidence.

]]>Anyway, I’ve been through all this. My point is that we, living today, should want to understand the methods themselves. (At least some of the criticisms could actually even be on target rather than non sequiturs.) There are all sorts of reasons that the notions were chosen. Pearson and Neyman actually have different recollections of where the appeal to the alternative hypothesis came from, and whether the notion of power came from one of Pearson’s early jobs (as Egon said), or something Lola said, or Student, or Borel, or when they were eating shrimp on a summer holiday (see E. Pearson’s “the N-P Story”). By placing weight on these matters of idiosyncrasy and personality (“discovery vs “justification” considerations) all sorts of criticisms have been launched of the form: “I know a P-value isn’t an error probability because Fisher lambasted Neyman and called him a Russian”. These are fallacious arguments and yet free people from the much harder job of erecting appropriate, perhaps different, rationales for the various methods in use. It also lets them declare falsely that the methods are inconsistent with each other and so should be dumped entirely.

Pearson always said that N-P statistics were not to be considered a static system but were to be upgraded and modified as problems and technologies changed. Read Lehmann’s recent, slim book on Neyman, Fisher and the creation of classical statistics. (It can be won for free in my palindrome contest.)

There is no doubt that each recommended different methods at times, so what? This in no way hampers communication or critical appraisal, as if scientists don’t generally tackle a problem in completely different ways quite deliberately. The idea that there is to be one and only one way to skin a cat is very distant from real science Different methods are good at unearthing errors that others miss. If one remembers that statistical inference is distinct from scientific inference, the various moves at the statistical level don’t preclude shared inferences at the substantive level.

As for fiducial intervals, everyone knows it’s an intriguing, deep/dark mystery just what Fisher meant, and trillions of pages have been written. It’s the house of the rising sun of statistics, and many have spent hours in misery trying to figure out how to make sense of Fisher. I find it very interesting that it’s enjoying a resurgence and several of the people who commented on my Likelihood Principle article in Statistical Science are advancing modern variants of fiducialism (Fraser is one). Sometimes Fisher even seemed to have severity in mind!

ADDITION: I should noted that the infelicities of the fiducial argument, appearing to countenance an illicit probabilistic instantiation, is definitely one of the reasons Neyman emphasized the behavioristic interpretation of CIs. He wanted to be clear that he wasn’t agreeing with Fisher on fiducial intervals. See the Fisher/ Neyman exchange in the “triad” of 1955/56 on this blog.

]]>