The most surprising discovery about today’s statistics wars is that some who set out shingles as “statistical reformers” themselves are guilty of misdefining some of the basic concepts of error statistical tests—notably power. (See my recent post on power howlers.) A major purpose of my Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP) is to clarify basic notions to get beyond what I call “chestnuts” and “howlers” of tests. The only way that disputing tribes can get beyond the statistics wars is by (at least) understanding correctly the central concepts. But these misunderstandings are more common than ever, so I’m asking readers to help. Why are they more common (than before the “new reformers” of the last decade)? I suspect that at least one reason is the popularity of Bayesian variants on tests: if one is looking to find posterior probabilities of hypotheses, then error statistical ingredients may tend to look as if that’s what they supply. Continue reading
Posts Tagged With: misunderstanding power
Join me in reforming the “reformers” of statistical significance tests
To raise the power of a test is to lower (not raise) the “hurdle” for rejecting the null (Ziliac and McCloskey 3 years on)
I said I’d reblog one of the 3-year “memory lane” posts marked in red, with a few new comments (in burgundy), from time to time. So let me comment on one referring to Ziliac and McCloskey on power. (from Oct.2011). I would think they’d want to correct some wrong statements, or explain their shifts in meaning. My hope is that, 3 years on, they’ll be ready to do so. By mixing some correct definitions with erroneous ones, they introduce more confusion into the discussion.
From my post 3 years ago: “The Will to Understand Power”: In this post, I will adhere precisely to the text, and offer no new interpretation of tests. Type 1 and 2 errors and power are just formal notions with formal definitions. But we need to get them right (especially if we are giving expert advice). You can hate the concepts; just define them correctly please. They write:
“The error of the second kind is the error of accepting the null hypothesis of (say) zero effect when the null is in face false, that is, then (say) such and such a positive effect is true.”
So far so good (keeping in mind that “positive effect” refers to a parameter discrepancy, say δ, not an observed difference.
And the power of a test to detect that such and such a positive effect δ is true is equal to the probability of rejecting the null hypothesis of (say) zero effect when the null is in fact false, and a positive effect as large as δ is present.
Let this alternative be abbreviated H’(δ):
H’(δ): there is a positive effect as large as δ.
Suppose the test rejects the null when it reaches a significance level of .01.
(1) The power of the test to detect H’(δ) =
P(test rejects null at .01 level; H’(δ) is true).
Say it is 0.85.
“If the power of a test is high, say, 0.85 or higher, then the scientist can be reasonably confident that at minimum the null hypothesis (of, again, zero effect if that is the null chosen) is false and that therefore his rejection of it is highly probably correct”. (Z & M, 132-3).
But this is not so. Perhaps they are slipping into the cardinal error of mistaking (1) as a posterior probability:
(1’) P(H’(δ) is true| test rejects null at .01 level)! Continue reading
Part 2 Prionvac: The Will to Understand Power
As a Nietzschean, I am fond of the statistical notion of power; yet it is often misunderstood by critics of testing. Consider leaders of the reform movement in economics, Ziliac and McCloskey (Michigan, 2009).
In this post, I will adhere precisely to the text, and offer no new interpretation of tests. Type 1 and 2 errors and power are just formal notions with formal definitions. But we need to get them right (especially if we are giving expert advice). You can hate them; just define them correctly please. They write: Continue reading