If a test’s power to detect µ’ is low then a statistically significant result is good/lousy evidence of discrepancy µ’? Which is it?
If your smoke alarm has little capability of triggering unless your house is fully ablaze, then if it has triggered, is that a strong or weak indication of a fire? Compare this insensitive smoke alarm to one that is so sensitive that burning toast sets it off. The answer is: that the alarm from the insensitive detector is triggered is a good indication of the presence of (some) fire, while hearing the ultra sensitive alarm go off is not.[i]
Yet I often hear people say things to the effect that:
if you get a result significant at a low p-value, say ~.03,
but the power of the test to detect alternative µ’ is also low, say .04 (i.e., POW(µ’)= .04),then “the result hasn’t done much to distinguish” the data from that obtained by chance alone.
–but wherever that reasoning is coming from it’s not from statistical hypothesis testing, properly understood. It’s easy to see.
We can use a variation on the one-sided test T+ from our illustration of power: We’re testing the mean of a Normal distribution with n iid samples, and (for simplicity) known σ:
H0: µ ≤ 0 against H1: µ > 0
Let σ = 1, n = 25, so (σ/ √n) = .2.
To avoid those annoying X-bars, I will use M instead. The Excel example has µ ≤ 12, but it’s even easier to have 0, and easy to switch over.Test T+ rejects H0 at the .025 level if M > 1.96(.2). Let’s make it the 2-standard deviation cut-off:
Test T+ rejects H0 at ~ .025 level if M > 2(.2) = .4. So the cut-off M*= .4.
Now we need a µ’ such that POW(µ’) = low.
Power is always defined in terms of the cut-off for rejection, M*.
- I know the power against alternatives between 0 and cut-off M* will be less than .5.
- I’ll get really low power (.16) if µ’ were to exceed 0 by only 1 (σ/ √n) unit –which in this case is 1(.2) = .2. (That is, POW(.2) =.16).
- I’ll get even lower power if µ’ were to exceed 0 by only .25 (σ/ √n) unit–which in this case is .25(.2) = .05.
I’m cutting corners with symbols wherever possible.
So what’s the power of T+ against .05? POW(.05) = ?
P(M > .4; µ = .05)= P(Z > (.4 -.05)(1/.2)) = P(Z > .35(5)) =P(Z > 1.75)= .04
So POW(.05) = .04 –quite low.
[Whether this low chance of triggering when µ = .05 is just what we want is a separate issue.]
My claim is, if it has triggered, say just at the cut-off M* (.4), then there’s a good indication µ >.05.
You can see this using lower confidence limits (LL) corresponding to test T+.
Find the .96 lower confidence limit (LL) corresponding to test T+, supposing the observed sample mean M = .4. (Never mind that we’d typically estimate σ).
µ > M – (1.75)1/ √25
µ > M – (1.75)(.2)
µ > M – .35.
Since we’re imagining M reaches the cut-off M*, we have the following one-sided lower .96 confidence limit.
µ > .4 – .35 = .05.
So µ > .05 is certainly warranted.
(This is also given by severity reasoning.)
Here’s another example: What’s the power of T+ against .1? POW(.1) = ?
P(M > .4; µ = .1)= P(Z > (.4 -.1)(1/.2)) = P(Z > (.3)(5)) = P(Z > 1.5) = .07
So POW(.1) = .07.
Correspondingly, µ = .1 is the lower limit of a one-sided confidence interval with confidence level of ______?
So the statistically significant result is a better indication that µ > .05 than µ > .1.
You can see the duality between CIs and tests, but I’ll come back to this. The main lesson is:
If a test’s power to detect µ’ is low, then a statistically significant result (i.e., a rejection of the null with low p-value), is a good indication of discrepancy µ’.
[i] I assume the alarm system shares the obvious properties of good tests for detecting discrepancies; that’s the point of an analogy. In any event, I have delineated those points elsewhere.
Thanks for posting this. I don’t really buy your smoke detector argument. A closer analogy to what I wrote on low-power m-s tests is;
You have a kitchen smoke detector that can detect smoke, but is also so crappy it randomly goes off, for reasons totally unconnected to smoke or fire, one day in 20. Also, the *only* potential source of any smoke or fire in your house on any day is someone striking one match inside a in metal box in the basement – to trigger the alarm most of this match’s little puff of smoke has to get to the detector, and while that could happen it is highly unlikely. So, you have low power to detect the fire/smoke that the match provides, not much above 0.05.
One day the alarm goes off. This is a “good indication” of a match being struck? How?
NB If it helps any, you seem to be entertaining that it’s possible to have much greater power, e.g. that someone could be setting off munitions in the basement. I am not, and wasn’t in my earlier comments.