Andrew Gelman had said he would go back to explain why he sided with Neyman over Fisher in relation to a big, famous argument discussed on my Feb. 16, 2013 post: “Fisher and Neyman after anger management?”, and I just received an e-mail from Andrew saying that he has done so: “In which I side with Neyman over Fisher”. (I’m not sure what Senn’s reply might be.) Here it is:
“In which I side with Neyman over Fisher” Posted by Andrew on 24 May 2013, 9:28 am
As a data analyst and a scientist, Fisher > Neyman, no question. But as a theorist, Fisher came up with ideas that worked just fine in his applications but can fall apart when people try to apply them too generally.
Here’s an example that recently came up.
Deborah Mayo pointed me to a comment by Stephen Senn on the so-called Fisher and Neyman null hypotheses. In an experiment with n participants (or, as we used to say, subjects or experimental units), the Fisher null hypothesis is that the treatment effect is exactly 0 for every one of the n units, while the Neyman null hypothesis is that the individual treatment effects can be negative or positive but have an average of zero.
Senn explains why Neyman’s hypothesis in general makes no sense—the short story is that Fisher’s hypothesis seems relevant in some problems (sometimes we really are studying effects that are zero or close enough for all practical purposes), whereas Neyman’s hypothesis just seems weird (it’s implausible that a bunch of nonzero effects would exactly cancel). And I remember a similar discussion as a student, many years ago, when Rubin talked about that silly Neyman null hypothesis.
Thinking about it more, though, I side with Neyman over Fisher, because the interesting problem for me is not testing the null hypothesis, which in nontrivial problems can never be true anyway, but in estimation. And in estimation I am intersted in an average effect, not an effect that is identical across all people. I could imagine a model in which the variance of the treatment effect is proportional to its mean—this would bridge between the Neyman and Fisher ideas—but this is not a model that anyone ever fits.
So, just to say it again: if it’s a pure null hypothesis, sure, go with Fisher. But if you’re inverting a family of hypothesis tests to get a confidence interval (something which I’d almost never want to do, but let’s go with this, since that’s the common application of these ideas), I’d go with Neyman, as it omits the implausible requirement that the treatment effect be exactly identical on all items.
If you look at the original post, you can read the comments, and even see what some people said about the anger management example.