As a Nietzschean, I am fond of the statistical notion of power; yet it is often misunderstood by critics of testing. Consider leaders of the reform movement in economics, Ziliac and McCloskey (Michigan, 2009).
In this post, I will adhere precisely to the text, and offer no new interpretation of tests. Type 1 and 2 errors and power are just formal notions with formal definitions. But we need to get them right (especially if we are giving expert advice). You can hate them; just define them correctly please. They write:
“The error of the second kind is the error of accepting the null hypothesis of (say) zero effect when the null is in face false, that is, then (say) such and such a positive effect is true.”
So far so good.
And the power of a test to detect that such and such a positive effect d is true is equal to the probability of rejecting the null hypothesis of (say) zero effect when the null is in fact false, and a positive effect as large as d is present.
Let this alternative be abbreviated H’(d):
H’(d): there is a positive effect as large as d.
Suppose the test rejects the null when it reaches a significance level of .01.
(1) The power of the test to detect H’(d) equals
P(test rejects null at .01 level; H’(d) is true).
For example, if the prion vaccine so effective that it increases survival as much as 2 years, then, let us allow, the probability of rejecting the null would be high. Say it is .85.
“If the power of a test is high, say, .85 or higher, then the scientist can be reasonably confident that at minimum the null hypothesis (of, again, zero effect if that is the null chosen) is false and that therefore his rejection of it is highly probably correct”. (132-3).
But this is not so. Perhaps they are slipping into the cardinal error of mistaking (1) as
(1’) P(H’(d) is true; test rejects null at .01 level)!
In dealing with these passages, I see why Spanos ( 2008) declared, “You have it backwards” in his review of their book. I had assumed (before reading it myself ) that Z & M, as with many significance test critics, were pointing out that the fallacy is common, not that they were committing it. I am confident they will want to correct this and not give Prionvac grounds to claim evidence of a large increase in survival simply because their test had a high capability (power) to detect that increase, if it were in fact present.
What Z & M say about power analysts in this chapter is fine; but the power analysts are concerned about interpreting non-statistically significant results with tests that have low-power to detect any but the grossest effects. Perhaps this is the source of their confusion.
[Aside: They do not say whether a rejection of the null is merely to infer the existence of a (positive) effect or whether it is to infer an effect of size d, so neither do I. They also do not say if it is a one-sided or two-sided test; neither qualification effects the upshot of the flaw I am raising.]
Spanos, A. (2008), Review of S. Ziliak and D. McCloskey’s The Cult of Statistical Significance, Erasmus Journal for Philosophy and Economics, volume 1, issue 1: 154-164.
Ziliak, Z. and McCloskey, D. (2008), The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice and Lives, University of Michigan Press.