Here’s how the Prionvac appraisal should have ended:
Prionvac: Our experiments yield a statistically significant increase in survival among scrapie-infected mice who are given our new vaccine compared to infected mice who are treated with a placebo (p = .01). The data indicate H: an increased survival rate of 9 months, compared to untreated mice.
Reformer: You are exaggerating what your data show. In fact, there is a fairly high probability, more than .5, that your study would produce a p = .01 difference, even if the actual increased rate of survival were only 1 month! (That is, the power to reject the null and infer H: increase of 1 months, is more than .5.)
Prionvac: Well, then, the data indicate an increased survival of 2 years.
The reformer had found this altogether reasonable! Here’s a correct ending:
What? That would be an even more egregiously wrong interpretation of the data. I’ve just showed why your data do not warrant the inference to H that the increased survival in vaccinated mice is at least 9 months! If your result fails to sustain that much, it surely doesn’t warrant an even stronger inference to an increased survival of 2 years!!
It would be very convenient for you to argue that your evidence, which fails even to show an improvement of 9 months, is nevertheless excellent evidence for an improvement of 2 years!!! But it is wholly illogical.
The invalid argument can be put as follows:
If Prionvac treatment really was tremendously successful (and extended mice life for 2 years), then, with high probability I would observe statistical significance, say , at the .01 level. i.e., the power of detecting so big an improvement is high.
I observed a result that is statistically significant at the .01 level.
Therefore, the result is good evidence of improved survival.
(Note I allowed the conclusion to be the weaker claim of merely some improved survival; but the inference is still radically flawed.)
Ziliac and McCloskey, by misdefining power, inadvertently support the invalid argument. They have a responsibility of getting it right, so a correction is in order. If you’re going to put out your shingle as an expert on misuses of statistics, then you should not misinterpret the most basic notions such as power.
Nor is this an isolated case, it gets worse (and it belies their own goal to block self-serving, but erroneous, uses of tests by drug companies). They recommend the “true type 1 error” be computed by dividing the significance level by the power to detect worthy effects. It is interesting to try to figure out why they would have misled themselves in this way. I have an idea, but I leave it open to readers to try their hand.