As you enjoy the weekend discussion & concert in the Captain’s Central Limit Library & Lounge, your Tour Guide has prepared a brief overview of Excursion 3 Tour I, and a short (semi-severe) quiz on severity, based on exhibit (i).*****

We move from Popper through a gallery on “Data Analysis in the 1919 Eclipse tests of the General Theory of Relativity (GTR)” (3.1) which leads to the main gallery on the origin of statistical tests (3.2) by way of a look at where the main members of our statistical cast are in 1919: Fisher, Neyman and Pearson. From the GTR episode, we identify the key elements of a statistical test–the steps in E.S. Pearson’s opening description of tests in 3.2. The classical testing notions–type I and II errors, power, consistent tests–are shown to grow out of requiring probative tests. The typical (behavioristic) formulation of N-P tests came later. The severe tester breaks out of the behavioristic prison. A first look at the severity construal of N-P tests is in Exhibit (i). Viewing statistical inference as severe testing shows how to do all N-P tests do (and more) while a member of the Fisherian Tribe (3.3). We consider the frequentist principle of evidence FEV and the divergent interpretations that are called for by Cox’s taxonomy of null hypotheses. The last member of the taxonomy–substantively based null hypotheses–returns us to the opening episode of GTR.

*key terms (incomplete please send me yours)*

GTR, eclipse test, ether effect, corona effect, PPN framework, statistical test ingredients, Anglo-Polish collaboration, Lambda criterion; Type I error, Type II error, power, P-value, unbiased tests, consistent tests uniformly most powerful (UMP); severity interpretation of tests, severity function, water plant accident; sufficient statistic; frequentist principle of evidence FEV; sensitivity achieved, [same as attained power (att power)], Cox’s taxonomy (embedded, nested, dividing, testing assumptions), Nordvedt effect, equivalence principle (strong and weak)

**Semi-Severe Severity Quiz, based on the example in Exhibit (i) of Excursion 3**

- Keeping to Test T+ with
*H*_{0}: μ ≤ 150 vs.*H*_{1}: μ > 150, σ = 10, and*n*= 100, observed*x*_{ }= 152 (i.e., d = 2), find the severity associated with μ > 150.5 .

i.e.,SEV_{100}(μ > 150.5) = ________

- Compute 3 or more of the severity assessments for Table 3.2, with
*x*_{ }= 153. **Comparing**Keeping to Test T+ with*n*= 100 with*n*= 10,000:*H*_{0}: μ ≤ 150 vs.*H*_{1}: μ > 150, σ = 10, change the sample size so that*n*= 10,000.

The 2SE rejection rule would now be: reject (i.e., “infer evidence against *H*_{0}”) whenever *X* _{ } > _____.

Assume *x* _{ }= just reaches this 2SE cut-off. (added previous sentence, Dec 10, I thought it was clear.) What’s the severity associated with inferring μ > 150.5 now?

i.e., SEV_{10,000}(μ > 150.5) = ____

Compare with SEV_{100}*(*μ > 150.5).

4. NEW. I realized I needed to include a “negative” result. Assume *x* _{ }= 151.5. Keeping to the same test with n = 100, find SEV_{100}*(*μ ≤ 152).

5. If you’re following the original schedule, you’ll have read Tour II of Excursion 3, so here’s an easy question: Why does Souvenir M tell you to “relax”?

6. **Extra Credit**: supply some key terms from this Tour that I left out in the above list.

*The reference is to Mayo (2018, CUP): Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars.

1. SEV(μ > 150.5) is the worst case (i.e., highest) probability, under μ ≤ 150, of getting a test statistic that accords less with “μ > 150.5” than does the observed test statistic x_bar = 152. So,

SEV(μ > 150.5) = Pr(X_bar ≤ 152; μ = 150) = 0.93

2. nah

3. In Q1 the standard error was 1; now the standard error is 0.1 and the 2SE rejection threshold is 150.2 and SEV(μ > 150.5) = 1 – tiny ε

4. Back to SE = 1. SEV(μ ≤ 152) is the worst case (i.e., highest) probability, under μ > 152, of getting a test statistic that accords less with “μ ≤ 152” than does the observed test statistic x_bar = 151.5. So,

SEV(μ ≤ 152) = Pr(X_bar > 151.5; μ = 152) = 0.69

5. I’m not there yet. Let’s see… probably something to do with how severity clears up confusions and makes equivocations that paper over disagreements between various tribes unnecessary.

—

So I have a question — a genuine question, mind, not a trick question or a rhetorical point-making one — regarding the application of severity reasoning in discrete settings. Suppose I have a measurement device that answers some binary yes/no question. The device is known to give correct answers 7 times in 10 and to give answers selected uniformly at random 3 times in 10.

The severity with which H passes a test with some test statistic, we are told, is the (worst-case, but that’s not relevant here) probability of getting a test statistic that accords less well with H than the one observed, supposing H to be false.

I have yet to see how to measure accordance in the discrete setting (perhaps it’s described in sections of the book I haven’t read yet). The question is: does accordance encompass being correct by happy chance in the 3 times in 10 that the measurement device gives a random answer? If so, severity is the proportion of times the answer is correct, 17 times in 20. But I suspect not, so I guess my real question is: is severity just the proportion of times the measurement device operates correctly?

MAYO GRADES COREY’S QUIZ.

1. SEV(μ > 150.5) = Pr(X_bar ≤ 152; μ = 150) = 0.93

MAYO: Correct numerical answer, but it should be:

SEV(μ > 150.5) = Pr(X_bar ≤ 152; μ = 150.5) = 0.93. Half off.

2. nah

3. In Q1 the standard error was 1; now the standard error is 0.1 and the 2SE rejection threshold is 150.2 and SEV(μ > 150.5) = 1 – tiny ε

MAYO: Incorrect. I should have added “show your work”,then you’d spot errors. However, it’s possible you read the question assuming the observed mean M0 = 152, whereas it was to be the 2SE cut-off. I added a sentence.

4. Back to SE = 1. SEV(μ ≤ 152) is the worst case (i.e., highest) probability, under μ > 152, of getting a test statistic that accords less with “μ ≤ 152” than does the observed test statistic x_bar = 151.5. So,

SEV(μ ≤ 152) = Pr(X_bar > 151.5; μ = 152) = 0.69

MAYO: Yes.

5. I’m not there yet. Let’s see… probably something to do with how severity clears up confusions and makes equivocations that paper over disagreements between various tribes unnecessary.

MAYO: Not even close.

MAYO: So it appears that you got 1 1/2 out of 5. You may wish to resubmit.

—

I misread Q1 at first; when I noticed I fixed the answer but missed the typo in the SEV expression.

For the normal model 2SE is always SEV = 0.98; that’s what I had for Q1 before I noticed that I used the wrong mean.

Curious that you failed to criticizing me for writing that the worst case severity is the highest probability instead of the lowest. I originally started writing my answers using (S-2)* instead of (S-2) and I missed that correction too.

Corey’s response to my grading his SEV quiz reminds me of some students I’ve had. And when one is grading 30 papers with circuitous stories about why they thought this or that and read it wrong but knew the right ans, and why didn’t I mark off for something else that was confused or backwards, one tries to give the most generous reading.

But I know Corey knows these computations. I spoze he just tried doing them standing up while doing something else because they are so easy. If you send me a revised quiz, I will happily remove your first attempt.

Attempting a quiz or other assignment earns reward cruise pts. 7 pts for a prize.

Let the record stand; I don’t have anything at stake. When you get down to it that’s the real reason I was so sloppy…

Yeah, not enough respect for me I guess.

Not sure how incentives around a quiz translates into respect for you…

Keeping to those I could do on my phone + WolframAlpha I got:

1. Claim is mu > 150.5 in light of xbar = 152

Hence compute prob disagree under not mu = 1-agree as

1- max P(x > 152 ; mu less than or equal 150.5, sigma = 1)

=

0.933

2. Don’t have book on me right now, will do later.

3. As in 1 but now compute

1- max P(x > 150.2 ; mu less than or equal 150.5, sigma = 0.1)

=

0.17

4. Compute 1- max P (x less than or equal 151.5; mu >= 152)

=

0.69

Oops, wolfram alpha takes the variance not the standard deviation as the argument…

So for 3 I get prob x less than or equal 150.2, x norm(150.5,0.1^2)

= 0.0013

Repeat 1-4 but based on the median and the Laplace(mu,gamma) model with gamma=10/sqrt(2) to give a variance of 100.

Replace 5 by:

Which model would you use, N(mu,100) or Laplace(mu,10/sqrt(2)) ? Give your reasons.

Addendum: The median of the 100 readings is approximately N(mu,1/2)

1. alpha = 0.025, xbar=152 so we reject the null hypothesis that mu <= 150. mu appears larger than 150, but what values larger than 150 are severely supported?

SIR: SEV100(X<152 | mu=150.5) = Pr((X-150.5)/(10/sqrt(100)) < (152-150.5)/(10/sqrt(100))) = Pr(Z 153-150 ) =Pr(Z>3) = 0.00135 so we reject H0: mu<150.

mu appears larger than 150, but what values are severely supported?

SIR: SEV100(X<153 | mu=150) = Pr((X-150)/(10/sqrt(100)) < (153-150)/(10/sqrt(100))) = Pr(Z<3) = 0.999

SIR: SEV100(X<153 | mu=151) = Pr((X-151)/(10/sqrt(100)) < (153-151)/(10/sqrt(100))) = Pr(Z<2) = 0.977

SIR: SEV100(X<153 | mu=152) = Pr((X-152)/(10/sqrt(100)) < (153-152)/(10/sqrt(100))) = Pr(Z 2 | mu=150, n=10000, s=10) = Pr( (Xbar – 150)/(10/sqrt(10000)) > 2) = Pr( Xbar > 150 + 2*10/100 ) = Pr(Xbar > 150.2)

xbar=150.2 so we reject Ho: mu<= 150. mu appears larger than 150, but what values larger than 150 are severely supported?

SIR: SEV( X < 150.2 | mu=150.5) = Pr( (X-150.5)/(10/sqrt(10000)) < (150.2 – 150.5)/(10/sqrt(10000))) = Pr(Z < -3) = 0.00135

mu=150.5 is not severely supported for xbar=150.2 with n=10,000.

4. alpha = 0.025 xbar=151.5 so for H0: mu 151.5 | mu=150) = Pr(Z > (151.5 – 150)/(10/sqrt(100))) = Pr(Z > 1.5) = 0.067. We fail to reject Ho. mu appears to be low, but what values are severely supported?

SIN: SEV100(Xbar > 151.5 | mu = 152) = Pr( (Xbar – 152)/(10/sqrt(100)) > (151.5 – 152)/(10/sqrt(100))) = Pr(Z > -0.5) = 0.691, values above 152 are not severely supported.

5. The first thing I do when I relax is breathe in a lovely lungful of air, which of course would make me more bouyant in the quicksand.

6. Implicationary assumption, actual assumption, conditional test

Oh my – WordPress’s markup language ate half of my text. Apologies that my quiz answers appear so nonsensical. I’ll work with Mayo to clean it up.