Debunking the “power paradox” allegation from my previous post. The authors consider a one-tailed Z test of the hypothesis *H*_{0}: μ ≤ 0 versus *H*_{1}: μ > 0: our Test T+. The observed sample mean is = 1.4 and in the first case _{σx} = 1, and in the second case _{σx} = 2.

First case: The power against μ = 3.29 is high, .95 (i.e. P(Z >* *1.645; μ=3.29) =1-φ(-1.645) = .95), and thus the DDS assessor would take the result as a good indication that μ < 3.29.

Second case: For σ** _{x}** = 2, the cut-off for rejection would be 0 + 1.65(2) = 3.30.

So, in the second case (σ_{x} = 2) the probability of erroneously accepting *H*_{0}, even if μ were as high as 3.29, is .5! (i.e. P(Z ≤* *1.645; μ=3.29) = φ(1.645-(3.29/2)) ~.5.) Although p_{1} < p_{2}[i] the justifiable upper bound in the first test is *smaller* (closer to 0) than in the second! Hence, the DDS assessment is entirely in keeping with the appropriate use of error probabilities in interpreting tests. There is no conflict with p-value reasoning.

NEW PROBLEM

The DDS power analyst always takes the worst cast of just missing the cut-off for rejection. Compare instead

SEV(μ < 3.29) for the first test, and SEV(μ < 3.29) for the second (using the actual outcomes as SEV requires).

[i] p_{1}= .081 and p_{2} = .242.

I get severity of 0.97 for the first and 0.83 for the second.

yes, john this is correct, thanks. I hadn’t done it before throwing it out as an example; the numbers are out of a hat (from H& H’s example). The second result at first seems surprising, but underscores the importance of taking into account the actual non significant result.. No response as of yet from the authors, which is unusual.

Dear Mayo and others,

Here, p1 = 0.08 and p2 = 0.242 are not significant at alpha = 5% (i.e., there is no statistical evidence to reject the null hypothesis at 5% of significance). OK, they are not significant at alpha = 5%, but can we claim evidence for this null hypothesis?

Both two experiments produced non-significant results, however, as the first has severity of 97% and the second has severity of 83%, can one argue that the first experiment gives more evidence for the null hypothesis (since in the first experiment we did not reject with higher severity than the second one)? If so, we have here a possible contradiction as claimed in the paper, isn’t?

Thanks for any help,

Alexandre

Hi, Alexandre, the claim here is that the first experiment yelds a more “precise” test, then we can warrant the claim that mu<3,29 with more severity.

So, if we look at the null as an aproximation, the first test provides more evidence that the "true" mean is closer to the null than the second one.

Maybe I’m missing something here. Is there anyone to help me in this issue?

Thanks a lot,

Alexandre.

Alexandre: Studying the post and the follow-up comparing the DDS assessment to SEV should help you. There is no allegation here that the results provide evidence for the precise truth of the null. What I wrote is that in case 1, the DDS assessor would take the result as a good indication that μ < 3.29. Second case: For σx = 2, the cut-off for rejection would be 0 + 1.65(2) = 3.30.

So, in the second case (σx = 2) the probability of erroneously accepting H0, even if μ were as high as 3.29, is .5! So (Neyman’s) DDS assessor would not take the result as a good indication that μ < 3.29. It passes with poor severity. I'm just repeating what I already wrote.

The SEV assessment gives a more custom-tailored assessment. In the follow-up post I compared these DDS assessments to the data-dependent construals recommended by SEV, focusing on case 2: “What is the Severity with which (μ<3.29) passes the test T+ in the case where σx = 2? …..SEV(μ 1.4; μ >3.29) > P(Z > (1.4 -3.29)/2)) * = P(Z > -1.89/2) = P(Z > -.945 ) ~ .83.” Once again, it is always a specific upper bound that is assessed for SEV. Alternatively, one may start with a desired SEV value and compute the value for μ’ such that μ ; μ < μ’ is warranted at that level. You say no one is helping you, but if you avail yourself of all the material provided (published papers go beyond the mini examples worked in the blog), you would help yourself. Perhaps do case 1, to check understanding.