.

*This is a modified reblog of an earlier post, since I keep seeing papers that confuse this.*

Suppose you are reading about a result **x** that is *just statistically significant* at level α (i.e., P-value = α) in a one-sided test T+ of the mean of a Normal distribution with *n* iid samples, and (for simplicity) known σ: *H*_{0}: µ ≤ _{ }0 against *H*_{1}: µ > _{ }0.

I have heard some people say:

A. If the test’s power to detect alternative µ’ is very low, then the just statistically significant **x** is *poor* evidence of a discrepancy (from the null) corresponding to µ’. (i.e., there’s poor evidence that µ > µ’ ).*See point on language in notes.

They will generally also hold that if POW(µ’) is reasonably high (at least .5), then the inference to µ > µ’ is warranted, or at least not problematic.

I have heard other people say:

B. If the test’s power to detect alternative µ’ is very low, then the just statistically significant **x** is *good* evidence of a discrepancy (from the null) corresponding to µ’ (i.e., there’s good evidence that µ > µ’).

They will generally also hold that if POW(µ’) is reasonably high (at least .5), then the inference to µ > µ’ is *un*warranted.

**Which is correct, from the perspective of the (error statistical) philosophy, within which power and associated tests are defined?** Continue reading →