Here’s how the Prionvac appraisal should have ended:

*Prionvac:* Our experiments yield a statistically significant increase in survival among scrapie-infected mice who are given our new vaccine compared to infected mice who are treated with a placebo (p = .01). The data indicate *H*: an increased survival rate of 9 months, compared to untreated mice.

*Reformer: *You are exaggerating what your data show. In fact, there is a fairly high probability, more than .5, that your study would produce a p = .01 difference, even if the actual increased rate of survival were only 1 month! (That is, the power to reject the null and infer *H*: increase of 1 months, is more than .5.)

*Prionvac:* Well, then, the data indicate an increased survival of 2 years.

The reformer had found this altogether reasonable! Here’s a correct ending:

*Reformer:*

What? That would be an even more egregiously wrong interpretation of the data. I’ve just showed why your data do not warrant the inference to *H* that the increased survival in vaccinated mice is at least 9 months! If your result fails to sustain that much, it surely doesn’t warrant an even stronger inference to an increased survival of 2 years!!

It would be very convenient for you to argue that your evidence, which fails even to show an improvement of 9 months, is nevertheless excellent evidence for an improvement of 2 years!!! But it is wholly illogical.

The invalid argument can be put as follows:

If Prionvac treatment really was tremendously successful (and extended mice life for 2 years), then, with high probability I would observe statistical significance, say , at the .01 level. i.e., the power of detecting so big an improvement is high.

I observed a result that is statistically significant at the .01 level.

Therefore, the result is good evidence of improved survival.

(Note I allowed the conclusion to be the weaker claim of merely some improved survival; but the inference is still radically flawed.)

Ziliac and McCloskey, by misdefining power, inadvertently support the invalid argument. They have a responsibility of getting it right, so a correction is in order. If you’re going to put out your shingle as an expert on misuses of statistics, then you should not misinterpret the most basic notions such as power.

Nor is this an isolated case, it gets worse (and it belies their own goal to block self-serving, but erroneous, uses of tests by drug companies). They recommend the “true type 1 error” be computed by dividing the significance level by the power to detect worthy effects. It is interesting to try to figure out why they would have misled themselves in this way. I have an idea, but I leave it open to readers to try their hand.

What does Prionvac mean, exactly, by “indicate H”?

A typical interpretation of what you wrote goes as follows; based on the data, we have an estimate of increase in mean lifespan* due to treatment. The estimate’s value is 9 months.

Interpreted this way, Prionvac’s second statement, when the estimate is switched from 9 months to 2 years based on no data, is vacuous and shows nothing. There is no reason for Reformer (or anyone else) to be convinced.

Incidentally, based on reverse-engineering the p-value, the 95% confidence interval for treatment effect is roughly 2-16 months. This is the range of values Prionvac might realistically claim that the data ‘indicate’… it doesn’t include 2 years, or one month.

* not survival rate, “9 months” is not a rate, as experts in the misuse

of statistics will attest…

True, because either may be used, and the poster converted from an example of rate increases to survival times, or mean survival times, as the text is unclear. But it is obvious that the problem of misdefinition occurs whatever example is put forward, and plays no role in the conceptual mistake.

Pingback: Fallacy of Rejection and the Fallacy of Nouvelle Cuisine « Error Statistics Philosophy