As you enjoy the weekend discussion & concert in the Captain’s Central Limit Library & Lounge, your Tour Guide has prepared a brief overview of Excursion 3 Tour I, and a short (semi-severe) quiz on severity, based on exhibit (i).*****

We move from Popper through a gallery on “Data Analysis in the 1919 Eclipse tests of the General Theory of Relativity (GTR)” (3.1) which leads to the main gallery on the origin of statistical tests (3.2) by way of a look at where the main members of our statistical cast are in 1919: Fisher, Neyman and Pearson. From the GTR episode, we identify the key elements of a statistical test–the steps in E.S. Pearson’s opening description of tests in 3.2. The classical testing notions–type I and II errors, power, consistent tests–are shown to grow out of requiring probative tests. The typical (behavioristic) formulation of N-P tests came later. The severe tester breaks out of the behavioristic prison. A first look at the severity construal of N-P tests is in Exhibit (i). Viewing statistical inference as severe testing shows how to do all N-P tests do (and more) while a member of the Fisherian Tribe (3.3). We consider the frequentist principle of evidence FEV and the divergent interpretations that are called for by Cox’s taxonomy of null hypotheses. The last member of the taxonomy–substantively based null hypotheses–returns us to the opening episode of GTR.

*key terms (incomplete please send me yours)*

GTR, eclipse test, ether effect, corona effect, PPN framework, statistical test ingredients, Anglo-Polish collaboration, Lambda criterion; Type I error, Type II error, power, P-value, unbiased tests, consistent tests uniformly most powerful (UMP); severity interpretation of tests, severity function, water plant accident; sufficient statistic; frequentist principle of evidence FEV; sensitivity achieved, [same as attained power (att power)], Cox’s taxonomy (embedded, nested, dividing, testing assumptions), Nordvedt effect, equivalence principle (strong and weak)

**Semi-Severe Severity Quiz, based on the example in Exhibit (i) of Excursion 3**

- Keeping to Test T+ with
*H*_{0}: μ ≤ 150 vs.*H*_{1}: μ > 150, σ = 10, and*n*= 100, observed*x*_{ }= 152 (i.e., d = 2), find the severity associated with μ > 150.5 .

i.e.,SEV_{100}(μ > 150.5) = ________

- Compute 3 or more of the severity assessments for Table 3.2, with
*x*_{ }= 153. **Comparing**Keeping to Test T+ with*n*= 100 with*n*= 10,000:*H*_{0}: μ ≤ 150 vs.*H*_{1}: μ > 150, σ = 10, change the sample size so that*n*= 10,000.

The 2SE rejection rule would now be: reject (i.e., “infer evidence against *H*_{0}”) whenever *X* _{ } > _____.

Assume *x* _{ }= just reaches this 2SE cut-off. (added previous sentence, Dec 10, I thought it was clear.) What’s the severity associated with inferring μ > 150.5 now?

i.e., SEV_{10,000}(μ > 150.5) = ____

Compare with SEV_{100}*(*μ > 150.5).

4. NEW. I realized I needed to include a “negative” result. Assume *x* _{ }= 151.5. Keeping to the same test with n = 100, find SEV_{100}*(*μ ≤ 152).

5. If you’re following the original schedule, you’ll have read Tour II of Excursion 3, so here’s an easy question: Why does Souvenir M tell you to “relax”?

6. **Extra Credit**: supply some key terms from this Tour that I left out in the above list.

*The reference is to Mayo (2018, CUP): Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars.