(Full) Excerpt. Excursion 5 Tour II: How Not to Corrupt Power (Power Taboos, Retro Power, and Shpower)


returned from London…

The concept of a test’s power is still being corrupted in the myriad ways discussed in 5.5, 5.6.  I’m excerpting all of Tour II of Excursion 5, as I did with Tour I (of Statistical Inference as Severe Testing:How to Get Beyond the Statistics Wars 2018, CUP)*. Originally the two Tours comprised just one, but in finalizing corrections, I decided the two together was too long of a slog, and I split it up. Because it was done at the last minute, some of the terms in Tour II rely on their introductions in Tour I.  Here’s how it starts:

5.5 Power Taboos, Retrospective Power, and Shpower

Let’s visit some of the more populous tribes who take issue with power – by which we mean ordinary power – at least its post-data uses. Power Peninsula is often avoided due to various “keep out” warnings and prohibitions, or researchers come during planning, never to return. Why do some people consider it a waste of time, if not totally taboo, to compute power once we know the data? A degree of blame must go to N-P, who emphasized the planning role of power, and only occasionally mentioned its use in determining what gets “confirmed” post-data. After all, it’s good to plan how large a boat we need for a philosophical excursion to the Lands of Overlapping Statistical Tribes, but once we’ve made it, it doesn’t matter that the boat was rather small. Or so the critic of post-data power avers. A crucial disanalogy is that with statistics, we don’t know that we’ve “made it there,” when we arrive at a statistically significant result. The statistical significance alarm goes off, but you are not able to see the underlying discrepancy that generated the alarm you hear. The problem is to make the leap from the perceived alarm to an aspect of a process, deep below the visible ocean, responsible for its having been triggered. Then it is of considerable relevance to exploit information on the capability of your test procedure to result in alarms going off (perhaps with different decibels of loudness), due to varying values of the parameter of interest. There are also objections to power analysis with insignificant results.

Exhibit (vi): Non-significance + High Power Does Not Imply Support for the Null over the Alternative. Sander Greenland (2012) has a paper with this title. The first step is to understand the assertion, giving the most generous interpretation. It deals with non-significance, so our ears are perked for a fallacy of non-rejection. Second, we know that “high power” is an incomplete concept, so he clearly means high power against “the alternative.” We have a handy example: alternative μ.84 in T+ (POW(T+, μ.84) = 0.84).

Note to blog reader: μ.84 abbreviates “the alternative against which the test has 0.84 power.” This general abbreviation was introduced in Tour I. 

Use the water plant case, T+: H0: μ ≤ 150 vs. H1: μ > 150, σ = 10, n = 100. With α = 0.025, z0.025 = 1.96, and the corresponding cut-off in terms of x0.025 is [150 + 1.96(10)/√100] = 151.96], μ.84 = 152.96.

Now a title like this is supposed to signal a problem, a reason for those “keep out” signs. His point, in relation to this example, boils down to noting that an observed difference may not be statistically significant – x may fail to make it to the cut-off  x0:025 – and yet be closer to μ.84 than to 0. This happens because the Type II error probability β (here, 0.16)1 is greater than the Type I error probability (0.025).

For a quick computation let x0:025  = 152 and μ.84 = 153. Halfway between alternative 153 and the 150 null is 151.5. Any observed mean greater than 151.5 but less than the x0.025 cut-off, 152, will be an example of Greenland’s phenomenon. An example would be those values that are closer to 153, the alternative against which the test has 0.84 power, than to 150 and thus, by a likelihood measure, support 153 more than 150 –  even though POW(μ  = 153) is high (0.84). Having established the phenomenon, your next question is: so what?

It would  be problematic if power analysis took the insignificant result as evidence for μ  = 150 –  maintaining compliance with the ecological stipulation – and I don’t doubt some try to construe it as such, nor that Greenland has been put in the position of needing to correct them. Power analysis merely licenses μμ.84  where 0.84 was chosen for “high power.” Glance back at Souvenir X. So at least one of the “keep out” signs can be removed.

All of Excursion 5 Tour II (in proofs) is here.


1 That is, β(μ.84) = Pr(d < 0.4; μ = 0.6) = Pr(Z < −1) = 0.16.



*This excerpt comes from Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (Mayo, CUP 2018).

It is still valuable to look at the discussions and comments under “power” and “shpower” on this blog.

Earlier excerpts and mementos from SIST (May 2018-May 2019) are here.


Where YOU are in the journey.






Categories: fallacy of non-significance, power, Statistical Inference as Severe Testing

Post navigation

Comments are closed.

Blog at WordPress.com.