U-Phil: Is the Use of Power* Open to a Power Paradox?

Posted on June 9, 2012 by Mayo

* to assess Detectable Discrepancy Size (DDS)

In my last post, I argued that DDS type calculations (also called Neymanian power analysis) provide needful information to avoid fallacies of acceptance in the test T+; whereas, the corresponding confidence interval does not (at least not without special testing supplements). But some have argued that DDS computations are “fundamentally flawed” leading to what is called the “power approach paradox”, e.g., Hoenig and Heisey (2001).

We are to consider two variations on the one-tailed test T+: H₀: μ ≤ 0 versus H₁: μ > 0 (p. 21). Following their terminology and symbols: The Z value in the first, Z_p1, exceeds the Z value in the second, Z_p2, although the same observed effect size occurs in both[i], and both have the same sample size, implying that σ₁ < σ₂. For example, suppose σ_x1 = 1 and σ_x2 = 2. Let observed sample mean M be 1.4 for both cases, so Z_p1 = 1.4 and Z_p2 = .7. They note that for any chosen power, the computable detectable discrepancy size will be smaller in the first experiment, and for any conjectured effect size, the computed power will always be higher in the first experiment.

“These results lead to the nonsensical conclusion that the first experiment provides the stronger evidence for the null hypothesis (because the apparent power is higher but significant results were not obtained), in direct contradiction to the standard interpretation of the experimental results (p-values).” (p. 21)

But rather than show the DDS assessment “nonsensical”, nor any direct contradiction to interpreting p values, this just demonstrates something nonsensical in their interpretation of the two p-value results from tests with different variances. Since it’s Sunday night and I’m nursing[ii] overexposure to rowing in the Queen’s Jubilee boats in the rain and wind, how about you find the howler in their treatment. (Also please inform us of articles pointing this out in the last decade, if you know of any.)

______________________

Hoenig, J. M. and D. M. Heisey (2001), “The Abuse of Power: The Pervasive Fallacy of Power Calculations in Data Analysis,” The American Statistician, 55: 19-24.

[i] The subscript indicates the p-value of the associated Z value.

[ii] With English tea and a cup of strong “Elbar grease”.

Categories: Statistics, U-Phil | Tags: criticism of frequentist methods, D. M Heisey, Hoenig and Heisey, J. M. Hoenig, power paradox, significance tests | 7 Comments

7 thoughts on “U-Phil: Is the Use of Power* Open to a Power Paradox?”

June 10, 2012

Paul

That does seem to be a pretty bone-headed interpretation by Hoenig and Heisey

Reply

June 10, 2012

Mayo

Why? I’m looking for an explicit explanation of where they go wrong. It seems obvious, and yet, did referees miss it?

Reply

June 11, 2012

john byrd

What is absurd is to expect two tests involving distinct variances to be directly comparable such that they should produce results that are complimentary somehow and sufficient for post test inferences about the null hypotheses. Each test must be further evaluated as a singular result to build support for the null.

Reply

June 12, 2012

Mayo

John: yes, and so there’s no “contradiction” between p values and a DDS analysis, and the DDS analysis gives the right answer. That is, their claim that “the nonsensical conclusion that the first experiment provides the stronger evidence for the null hypothesis” is actually the correct conclusion. But can journal editors have thought their argument held up (as an indictment of the power analysts treatment of negative results)?

Reply

June 21, 2012

Alexandre

I just start reading this “Power Analysis Paradox” and I’m not sure if I got this right. Let’s see…

Hoenig and Heisey (2001) said that observed power is a 1:1 decreasing function of the p-value: (1) for a p-value close to one, the observed power is close to zero; (2) for a p-value close to zero, the observed power is close to one. As I understood, a null hypothesis cannot be accepted if we observe a close-to-one p-value by using observed powers, since in these cases the observed powers will certainly be close to zero (it seems to be impossible to observe a p-value = 0.9 with high observed power, isn’t it?).

Reply

June 26, 2012

Mayo

Alexandre: checking up on older comments, I came across this (it was not forwarded, likely given it was over 10 days*). The Power analysis paradox to which I am referring alludes to ordinary POWER, whereas, to avoid just this kind of confusion, I refer to “observed power” as “shpower”. (you can search posts on this.) Ordinary power is always relative to a cut-off for rejection and a particular alternative against which one wishes to compute power. It does NOT involve setting the parameter value to what is observed, as in shpower. Hope this helps.
*I will try to make this 14 days, but can’t guarantee a response to all comments. You can always e-mail me directly.

Reply

March 19, 2015

Desmond

Late I know, but in case this hasn’t been resolved (and because I’ve just run into this problem) my two pence is as follows.

Hoenig and Heisey are criticising the use of ordinary power (calculated using pre-existing variability information) for making statements about non-rejected null hypotheses because the actual power of the test will be different, not because they think there is a logical fallacy in the ordinary calculation of power.

Reply

I welcome constructive comments that are of relevance to the post and the discussion, and discourage detours into irrelevant topics, however interesting, or unconstructive declarations that "you (or they) are just all wrong". If you want to correct or remove a comment, send me an e-mail. If readers have already replied to the comment, you may be asked to replace it to retain comprehension. Cancel reply

U-Phil: Is the Use of Power* Open to a Power Paradox?

Post navigation

7 thoughts on “U-Phil: Is the Use of Power* Open to a Power Paradox?”

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.

U-Phil: Is the Use of Power* Open to a Power Paradox?

Related

Post navigation

7 thoughts on “U-Phil: Is the Use of Power* Open to a Power Paradox?”

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.