Andrew Gelman had said he would go back to explain why he sided with Neyman over Fisher in relation to a big, famous argument discussed on my Feb. 16, 2013 post: “Fisher and Neyman after anger management?”, and I just received an e-mail from Andrew saying that he has done so: “In which I side with Neyman over Fisher”. (I’m not sure what Senn’s reply might be.) Here it is:
“In which I side with Neyman over Fisher” Posted by Andrew on 24 May 2013, 9:28 am
As a data analyst and a scientist, Fisher > Neyman, no question. But as a theorist, Fisher came up with ideas that worked just fine in his applications but can fall apart when people try to apply them too generally.
Here’s an example that recently came up.
Deborah Mayo pointed me to a comment by Stephen Senn on the so-called Fisher and Neyman null hypotheses. In an experiment with n participants (or, as we used to say, subjects or experimental units), the Fisher null hypothesis is that the treatment effect is exactly 0 for every one of the n units, while the Neyman null hypothesis is that the individual treatment effects can be negative or positive but have an average of zero.
Senn explains why Neyman’s hypothesis in general makes no sense—the short story is that Fisher’s hypothesis seems relevant in some problems (sometimes we really are studying effects that are zero or close enough for all practical purposes), whereas Neyman’s hypothesis just seems weird (it’s implausible that a bunch of nonzero effects would exactly cancel). And I remember a similar discussion as a student, many years ago, when Rubin talked about that silly Neyman null hypothesis.
Thinking about it more, though, I side with Neyman over Fisher, because the interesting problem for me is not testing the null hypothesis, which in nontrivial problems can never be true anyway, but in estimation. And in estimation I am intersted in an average effect, not an effect that is identical across all people. I could imagine a model in which the variance of the treatment effect is proportional to its mean—this would bridge between the Neyman and Fisher ideas—but this is not a model that anyone ever fits.
So, just to say it again: if it’s a pure null hypothesis, sure, go with Fisher. But if you’re inverting a family of hypothesis tests to get a confidence interval (something which I’d almost never want to do, but let’s go with this, since that’s the common application of these ideas), I’d go with Neyman, as it omits the implausible requirement that the treatment effect be exactly identical on all items.
If you look at the original post, you can read the comments, and even see what some people said about the anger management example.
As an aside, I’m surprised Gelman says he’d “almost never want to” invert a family of hypothesis tests to get a confidence interval, since, where possible, that is essentially what is done to use data to learn about magnitudes of discrepancy (that are and are not indicated), which I take him to be interested in.
I first coined the term blogolog here:
Just a curious side question: I went back to read the original anger management post, and I’m wondering: why do you say that you “find it hard to believe, however, that Fisher would have thrown some of Neyman’s wooden models onto the floor?”
Eileen: It just makes no sense, unless maybe Neyman had left his wooden models in the seminar room and Fisher was returning them. The idea of Fisher ransacking Neyman’s office is kooky enough–that he’d toss these fragile-looking models around, more so. If I’d known this issue would arise, I would have asked Lehmann when I read Reid. Maybe we’d need to ask a statistician’s shrink.