“(2) If POW(T+,µ’) is high, then an α statistically significant x is a good indication that µ < µ’."

In the discussed example, the power of the test is high against $\mu'=3$. Let's say we observe x=4, which is significant. No way can this be an indication that $\mu<3$.

]]>My post:

http://errorstatistics.com/2015/06/18/can-you-change-your-bayesian-prior-i/

Gelman’s post:

http://andrewgelman.com/2015/08/25/can-you-change-your-bayesian-prior/

There were over 100 comments on my post which illustrate how much disagreement there is about this.

]]>http://errorstatistics.com/2013/10/19/bayesian-confirmation-philosophy-and-the-tacking-paradox-in-i/ (something we currently don't have) ]]>

You might have your philosophical reasons to prefer your approach, but I think the subtlety will be lost on the people who believe that the rejection of a test is good evidence for µ≥µ1 if the power of the test against µ1 is high. That would imply that the evidence for µ≥(µ1+1) is better, because the power against (µ1+1) is higher, but for any reasonable definition of evidence this is not possible ( µ≥(µ1+1) implies µ≥µ1, so any evidence for the first is also evidence for the latter ). If this simple argument doesn’t convince them, I don’t think severity arguments will.

]]>In the first case, when a rejection from a test with low power is criticized, what I think actually happens is this: an area is known to have low power against plausibly sized discrepancies, and yet achieves significance. This, together with other evidence, may make people suspect the significant result is due to cherry-picking, multiple testing, p-hacking and various biasing selection effects. That suspicion is often warranted. However, questioning if the error probabilities are actual is different from reasoning based on error probabilities, when they are assumed to be approximately correct. ]]>

You introduce severity, in principle a function of T(α) and d(x0) which is in fact function of just x0 (and µ1=µ0+γ). Your definition for a non-rejected test is

The severity with which the claim µ≤µ1 passes test T(α) with data x0

[1] SEV(µ≤µ1)=P(d(x)≥d(x0);µ≥µ1)={1}=P(x≥x0;µ≥µ1)={2}=P(x≥x0;µ=µ1)

{1} because d(x) is a monotonically increasing function of x. {2} because you say severity is evaluated at µ=µ1.

For a rejected test, the corresponding definition is

The severity with which test T(α) passes µ1≥µ0 with data x0

[2] SEV(µ≥µ1)=P(d(x)≤d(x0);µ≤µ1)=P(x≤x0;µ≥µ1)=P(x≤x0;µ=µ1)

Not by coincidence, these results are identical to 1 – ( p-value when the hypothesis µ=µ1 is tested with data x0 and the alternative is [1] µ≤µ1 or [2] µ≥µ1 ).

You “emphasize that you are not advocating changing the original null and alternative hypothesis of the given test T(α); rather you are using the severe testing concept to evaluate which inferences are warranted, in this case of the form µ≤µ1”. As far as I can see, your severe testing concept is equivalent to changing the original null and alternative hypothesis and calculating the p-value given the data x0 for, in this case, H0:µ=µ1 and H1:µ≤µ1. The only dependency on the original test is that when rejected your alternative is µ≤µ1 and when not rejected µ≥µ1.

Maybe you have better examples where the severity interpretation of acceptance or rejection give results which are different from those obtained from a straightforward µ≤µ1 or µ≥µ1 hypothesis test?

]]>http://www.phil.vt.edu/dmayo/personal_website/2006Mayo_Spanos_severe_testing.pdf

I may come back to this tomorrow.

]]>I don’t understand the second remark. What test assumptions are sufficiently well met? Assuming that H0 is true and µ=0, then anything can be taken as an indication that µ is less than µ’.

Thanks for you replies. I still don’t understand the point of the exercise, but it’s probably because I don’t see why would anyone take the rejection of the null hypothesis as evidence for an arbitrary alternative in the first place. The test depends on the null hypothesis H0:µ=0, on the sampling distribution of the statistic, on the one-sided/two-sided distinction, and on the significance level α. It does not depend at all on the stated alternative hypothesis. The result of the test will be the same whether the alternative is H1: µ greater than 0, H1: µ=42, or H1: ”µ greater than 0 and the Moon is made of cheese”.

]]>Given that POW(T+,µ’) is less than 50% if and only if M* is above µ’ (the lower the power, the higher the difference) and POW(T+,µ’) is more than 50% if and only if M* is below µ’ (the higher the power, the higher the difference), then the rules essentially say: x greater than µ’ is an indication that µ greater than µ’ and x less than µ’ is an indication that µ less than µ’. I agree. ]]>

Given that POW(T+,µ’)µ’ (the lower the power, the higher the difference) and POW(T+,µ’)>50% if and only if M*µ’ is an indication that µ>µ’ and x<µ’ is an indication that µ<µ’. I agree. ]]>

Given that POW(T+,µ’)µ’ (the lower the power, the higher the difference) and POW(T+,µ’)>50% if and only if M*µ’ is an indication that µ>µ’ and x<µ’ is an indication that µ<µ’. I agree. ]]>

(1) If POW(T+,µ’) is low, then x=M* (the significance cutoff for T+) is a good indication that µ>µ’.

(2) If POW(T+,µ’) is high, then x=M* (the significance cutoff for T+) is a good indication that µ<µ’.

Given that POW(T+,µ’)<50% M*>µ’ (the lower the power, the higher the difference) and POW(T+,µ’)>50% M*µ’ is an indication that µ>µ’ and x<µ’ is an indication that µ<µ’. I agree.

I don't understand the second remark. What test assumptions are sufficiently well met? Assuming that H0 is true and µ=0, then anything can be taken as an indication that µ0, H1:µ=42, or H1:”µ>0 and the Moon is made of cheese”.

]]>