I want to complete the Neyman’s Nursery (NN) meanderings while we have some numbers before us, and while there is a particular example, test T+, on the table. Despite my warm and affectionate welcoming of the “power analytic” reasoning I unearthed in those “hidden Neyman” papers (see post from Oct. 22)– admittedly, largely lost in the standard decision-behavior model of tests–, it still retains an unacceptable coarseness: power is always calculated relative to the cut-off point ca for rejecting H0. But rather than throw out the baby with the bathwater, we should keep the logic and take account of the actual value of the statistically insignificant result.
(For those just tuning in, power analytic reasoning aims to avoid the age-old fallacy of taking a statistically insignificant result as evidence of 0 discrepancy from the null hypothesis, by identifying discrepancies that can and cannot be ruled out. For our test T+, we reason from insignificant results to inferences of the form: μ < μ0 + γ.
We are illustrating (as does Neyman) with a one-sided test T+, with μ0 = 0, and α=.025. Spoze σ = 1, n = 25, so X is statistically significant only if it exceeds .392.
Power-analytic reasoning says (in relation to our test T+):
If X is statistically insignificant and the POW(T+, μ= μ1) is high, then X indicates, or warrants inferring (or whatever phrase you like) that μ < μ1.)
Suppose one had an insignificant result from test T+ and wanted to evaluate the inference: μ < .4
(it doesn’t matter why just now, this is an illustration).
Since POW(T+, μ =.4) is hardly more than .5, Neyman would say “it was a little rash” to regard the observed mean as indicating μ < .4 . He would say this regardless of the actual value of the statistically insignificant result. There’s no place in the power calculation, as defined, to take into account the actual observed value.1
That is why, although high power to detect μ as large as μ1 is sufficient for regarding the data as good evidence that μ < μ1 , it is too coarse to be a necessary condition. Spoze, for example, that X = 0.
Were μ as large as .4 we would have observed a larger observed difference from the null than we did with high probability (~.98). Therefore, our data provide evidence that μ < .4.2
We might say that the severity associated with μ < .4 is high. There are many ways to articulate the associated justification—I have done so at length earlier; and of course it is no different from “power analytic reasoning”. Why consider a miss as good as a mile?
When I first introduced this idea in my Ph.D dissertation, I assumed researchers already did this, in real life, since it introduces no new logic. But I’ve been surprised not to find it.
I was (and am) puzzled to discover under “observed power” the Shpower computation, which we have already considered and (hopefully) gotten past—at least for present purposes, namely, reasoning from insignificant results to inferences of the form: μ < μ0 + γ.
Granted, there are some computations which you might say lead to virtually the same results as SEV, e.g., certain confidence limits, but even so there are differences of interpretation.3 Let me know if you think I am wrong, there may well be something out there I haven’t seen….
(1) This does not mean the place to enter it is in the hypothesized value of under which the power is computed (as with Shpower). This is NOT power, and as we have seen in two posts, it is fallacious to equate it to power or to power analytic reasoning. Note that the Shpower associated with X = 0 is .025—that we are interested in μ < .4 does not enter.
(2) It doesn’t matter here if we use ≤ or < .
(3) For differences between SEV and confidence intervals, see Mayo 1996, Mayo and Spanos 2006, 2011.