2025 leisurely cruise

December leisurely cruise “It’s the Methods, Stupid!” Excursion 3 Tour II (3.4-3.6)

2024 Cruise

Welcome to the December leisurely cruise:
Wherever we are sailing, assume that it’s warm, warm, warm (not like today in NYC). This is an overview of our first set of readings for December from my Statistical Inference as Severe Testing: How to get beyond the statistics wars (CUP 2018): [SIST]–Excursion 3 Tour II. This leisurely cruise, participants know, is intended to take a whole month to cover one week of readings from my 2020 LSE Seminars, except for December and January which double up. 

What do you think of  “3.6 Hocus-Pocus: P-values Are Not Error probabilities, Are Not Even Frequentist”? This section refers to Jim Berger’s famous attempted unification of Jeffreys, Neyman and Fisher in 2003. The unification considers testing 2 simple hypotheses using a random sample from a Normal distribution, computing their two P-values, rejecting whichever gets a smaller P-value, and then computing its posterior probability, assuming each gets a prior of .5. This becomes what he calls the “Bayesian error probability” upon which he defines “the frequentist principle”. On Berger’s reading of an important paper* by Neyman (1977), Neyman criticized p-values for violating the frequentist principle (SIST p. 186). *The paper is “frequentist probability and frequentist statistics”. Remember that links to readings outside SIST are at the Captains biblio on the top left of the blog. Share your thoughts in the comments.

Some snapshots from Excursion 3 tour II.

Excursion 3 Tour II: It’s The Methods, Stupid

Tour II disentangles a jungle of conceptual issues at the heart of today’s statistics wars. The first stop (3.4) unearths the basis for a number of howlers and chestnuts thought to be licensed by Fisherian or N-P tests.** In each exhibit, we study the basis for the joke.  Together, they show: the need for an adequate test statistic, the difference between implicationary (i assumptions) and actual assumptions, and the fact that tail areas serve to raise, and not lower, the bar for rejecting a null hypothesis. (Additional howlers occur in Excursion 3 Tour III)

recommended: medium to heavy shovel 

Stop (3.5) pulls back the curtain on the view that Fisher and N-P tests form an incompatible hybrid. Incompatibilist tribes retain caricatures of F & N-P tests, and rob each from notions they need (e.g., power and alternatives for F, P-values & post-data error probabilities for N-P). Those who allege that Fisherian P-values are not error probabilities often mean simply that Fisher wanted an evidential not a performance interpretation. This is a philosophical not a mathematical claim. N-P and Fisher tended to use P-values in both ways. It’s time to get beyond incompatibilism. Even if we couldn’t point to quotes and applications that break out of the strict “evidential versus behavioral” split, we should be the ones to interpret the methods for inference, and supply the statistical philosophy that directs their right use.” (p. 181)

strongly recommended: light to medium shovel, thick-skinned jacket

In (3.6) we slip into the jungle. Critics argue that P-values are for evidence, unlike error probabilities, but then aver P-values aren’t good measures of evidence either, since they disagree with probabilist measures: likelihood ratios, Bayes Factors or posteriors. A famous peace-treaty between Fisher, Jeffreys & Bayes promises a unification. A bit of magic ensues! The meaning of error probability changes into a type of Bayesian posterior probability. It’s then possible to say ordinary frequentist error probabilities (e.g., type I & II error probabilities) aren’t error probabilities. We get beyond this marshy swamp by introducing subscripts 1 and 2. Whatever you think of the two concepts, they are very different. This recognition suffices to get you out of quicksand.

required: easily removed shoes, stiff walking stick (review Souvenir M on day of departure)

**Several of these may be found in searching for “Saturday night comedy” on this blog. In SIST, however I trace out the basis for the jokes.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

selected key terms and ideas 

Howlers and chestnuts of statistical tests
armchair science
Jeffreys tail area criticism
Limb sawing logic
Two machines with different precisions
Weak conditionality principle (WCP)
Conditioning (see WCP)
Likelihood principle
Long run performance vs probabilism
Alphas and p’s
Fisher as behaviorist
Hypothetical long-runs
Freudian metaphor for significance tests
Pearson, on cases where there’s no repetition
Armour-piercing naval shell
Error probability1 and error probability 2
Incompatibilist philosophy (F and N-P must remain separate)
Test statistic requirements (p. 159)

Please share your questions, other key terms to add, and any typos you find, in the comments. Interested in joining us? Write to jemille6@vt.edu. I plan another group zoom soon.

Categories: 2025 leisurely cruise | Leave a comment

First Look at N-P Methods as Severe Tests: Water plant accident [Exhibit (i) from Excursion 3]

November Cruise

The example I use here to illustrate formal severity comes in for criticism  in a paper to which I reply in a 2025 BJPS paper linked to here. Use the comments for queries.

Exhibit (i) N-P Methods as Severe Tests: First Look (Water Plant Accident) 

There’s been an accident at a water plant where our ship is docked, and the cooling system had to be repaired.  It is meant to ensure that the mean temperature of discharged water stays below the temperature that threatens the ecosystem, perhaps not much beyond 150 degrees Fahrenheit. There were 100 water measurements taken at randomly selected times and the sample mean x computed, each with a known standard deviation σ = 10.  When the cooling system is effective, each measurement is like observing X ~ N(150, 102). Because of this variability, we expect different 100-fold water samples to lead to different values of X, but we can deduce its distribution. If each X ~N(μ = 150, 102) then X is also Normal with μ = 150, but the standard deviation of X is only σ/√n = 10/√100 = 1. So X ~ N(μ = 150, 1). Continue reading

Categories: 2025 leisurely cruise, severe tests, severity function, water plant accident | Leave a comment

Where Are Fisher, Neyman, Pearson in 1919? Opening of Excursion 3, snippets from 3.1

November Cruise

This second excerpt for November is really just the preface to 3.1. Remember, our abbreviated cruise this fall is based on my LSE Seminars in 2020, and since there are only 5, I had to cut. So those seminars skipped 3.1 on the eclipse tests of GTR. But I want to share snippets from 3.1 with current readers, along with reflections in the comments.

Excursion 3 Statistical Tests and Scientific Inference

Tour I Ingenious and Severe Tests

[T]he impressive thing about [the 1919 tests of Einstein’s theory of gravity] is the risk involved in a prediction of this kind. If observation shows that the predicted effect is definitely absent, then the theory is simply refuted.The theory is incompatible with certain possible results of observation – in fact with results which everybody before Einstein would have expected. This is quite different from the situation I have previously described, [where] . . . it was practically impossible to describe any human behavior that might not be claimed to be a verification of these [psychological] theories. (Popper 1962, p. 36)

Continue reading

Categories: 2025 leisurely cruise, SIST, Statistical Inference as Severe Testing | 2 Comments

November: The leisurely tour of SIST continues

2025 Cruise

We continue our leisurely tour of Statistical Inference as Severe Testing [SIST] (Mayo 2018, CUP) with Excursion 3. This is based on my 5 seminars at the London School of Economics in 2020; I include slides and video for those who are interested. (use the comments for questions) Continue reading

Categories: 2025 leisurely cruise, significance tests, Statistical Inference as Severe Testing | 1 Comment

Excursion 1 Tour I (3rd stop): The Current State of Play in Statistical Foundations: A View From a Hot-Air Balloon (1.3)

Third Stop

Readers: With this third stop we’ve covered Tour 1 of Excursion 1.  My slides from the first LSE meeting in 2020 which dealt with elements of Excursion 1 can be found at the end of this post. There’s also a video giving an overall intro to SIST, Excursion 1. It’s noteworthy to consider just how much things seem to have changed in just the past few years. Or have they? What would the view from the hot-air balloon look like now?  Share your thoughts in the comments.

ZOOM: I propose a zoom meeting for Sunday Nov. 15, Sunday, November 16 at 11 am or Friday, November 21 at 11am, New York time. (An equal # prefer Fri & Sun.) The link will be available to those who register/registered with Dr. Miller*.

The Current State of Play in Statistical Foundations: A View From a Hot-Air Balloon (1.3)

.

How can a discipline, central to science and to critical thinking, have two methodologies, two logics, two approaches that frequently give substantively different answers to the same problems? … Is complacency in the face of contradiction acceptable for a central discipline of science? (Donald Fraser 2011, p. 329)

We [statisticians] are not blameless … we have not made a concerted professional effort to provide the scientific world with a unified testing methodology. (J. Berger 2003, p. 4)

Continue reading

Categories: 2025 leisurely cruise, Statistical Inference as Severe Testing | Leave a comment

Blog at WordPress.com.