Monthly Archives: November 2024

66 Years of Cox’s (1958) Chestnut: Excerpt from Excursion 3 Tour II

2024 Cruise

.

We’re stopping briefly to consider one of the “chestnuts” in the exhibits of “chestnuts and howlers” in Excursion 3 (Tour II) of my book Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (SIST). It is now 66 years since Cox gave his famous weighing machine example in Sir David Cox (1958)[1]. It’s still relevant So, let’s go back to it, with an excerpt from SIST (pp. 170-173).

Exhibit (vi): Two Measuring Instruments of Different Precisions. Did you hear about the frequentist who, knowing she used a scale that’s right only half the time, claimed her method of weighing is right 75% of the time?

She says, “I flipped a coin to decide whether to use a scale that’s right 100% of the time, or one that’s right only half the time, so, overall, I’m right 75% of the time.” (She wants credit because she could have used a better scale, even knowing she used a lousy one.)

Basis for the joke: An N-P test bases error probability on all possible outcomes or measurements that could have occurred in repetitions, but did not. Continue reading

Categories: 2024 Leisurely Cruise | 2 Comments

Call for reader replacements! First Look at N-P Methods as Severe Tests: Water plant accident [Exhibit (i) from Excursion 3]

November Cruise

Although the numbers used in the introductory example are fine, I’m unhappy with it and seek a replacement–ideally with the same or similar numbers. It is assumed that there is a concern both with inferring larger, as well as smaller, discrepancies than warranted. Actions taken if too high a temperature is inferred would be deleterious. But, given the presentation, the more “serious” error would be failing to report an increase, calling for  H0: μ ≥ 150  as the null. But the focus on one-sided positive discrepancies is used through the book, so I wanted to keep to that. I needed a one-sided test with a null value other than 0, and saw an example like this in a book. I think it was ecology. Of course, the example is purely for a simple. numerical illustration.  Fortunately, the severity analysis gives the same interpretation of the data regardless of how the test and alternative hypotheses are specified. Still, I’m calling for reader replacements, a suitable reward to be ascertained. Continue reading

Categories: 2024 Leisurely Cruise, severe tests, severity function, statistical tests, water plant accident | 1 Comment

Neyman-Pearson Tests: An Episode in Anglo-Polish Collaboration: (3.2)

Neyman & Pearson

November Cruise: 3.2

This second of November’s stops in the leisurely cruise of SIST aligns well with my recent Neyman Seminar at Berkeley. Egon Pearson’s description of the three steps in formulating tests is too rarely recognized today. Note especially the order of the steps. Share queries and thoughts in the comments.

3.2 N-P Tests: An Episode in Anglo-Polish Collaboration*

We proceed by setting up a specific hypothesis to test, Hin Neyman’s and my terminology, the null hypothesis in R. A. Fisher’s . . . in choosing the test, we take into account alternatives to Hwhich we believe possible or at any rate consider it most important to be on the look out for . . .Three steps in constructing the test may be defined:

Step 1. We must first specify the set of results . . .

Step 2. We then divide this set by a system of ordered boundaries . . .such that as we pass across one boundary and proceed to the next, we come to a class of results which makes us more and more inclined, on the information available, to reject the hypothesis tested in favour of alternatives which differ from it by increasing amounts.

Step 3. We then, if possible, associate with each contour level the chance that, if H0 is true, a result will occur in random sampling lying beyond that level . . .

In our first papers [in 1928] we suggested that the likelihood ratio criterion, λ, was a very useful one . . . Thus Step 2 proceeded Step 3. In later papers [1933–1938] we started with a fixed value for the chance, ε, of Step 3 . . . However, although the mathematical procedure may put Step 3 before 2, we cannot put this into operation before we have decided, under Step 2, on the guiding principle to be used in choosing the contour system. That is why I have numbered the steps in this order. (Egon Pearson 1947, p. 173)

Continue reading

Categories: 2024 Leisurely Cruise, E.S. Pearson, Neyman, statistical tests | Leave a comment

Where Are Fisher, Neyman, Pearson in 1919? Opening of Excursion 3, snippets from 3.1

November Cruise

This first excerpt for November is really just the preface to 3.1. Remember, our abbreviated cruise this fall is based on my LSE Seminars in 2020, and since there are only 5, I had to cut. So those seminars skipped 3.1 on the eclipse tests of GTR. But I want to share snippets from 3.1 with current readers, along with reflections in the comments. (I promise, I’ve even numbered them below)

Excursion 3 Statistical Tests and Scientific Inference

Tour I Ingenious and Severe Tests

[T]he impressive thing about [the 1919 tests of Einstein’s theory of gravity] is the risk involved in a prediction of this kind. If observation shows that the predicted effect is definitely absent, then the theory is simply refuted.The theory is incompatible with certain possible results of observation – in fact with results which everybody before Einstein would have expected. This is quite different from the situation I have previously described, [where] . . . it was practically impossible to describe any human behavior that might not be claimed to be a verification of these [psychological] theories. (Popper 1962, p. 36)

The 1919 eclipse experiments opened Popper’ s eyes to what made Einstein’ s theory so different from other revolutionary theories of the day: Einstein was prepared to subject his theory to risky tests.[1] Einstein was eager to galvanize scientists to test his theory of gravity, knowing the solar eclipse was coming up on May 29, 1919. Leading the expedition to test GTR was a perfect opportunity for Sir Arthur Eddington, a devout follower of Einstein as well as a devout Quaker and conscientious objector. Fearing “ a scandal if one of its young stars went to jail as a conscientious objector,” officials at Cambridge argued that Eddington couldn’ t very well be allowed to go off to war when the country needed him to prepare the journey to test Einstein’ s predicted light deflection (Kaku 2005, p. 113).

The museum ramps up from Popper through a gallery on “ Data Analysis in the 1919 Eclipse” (Section 3.1) which then leads to the main gallery on origins of statistical tests (Section 3.2). Here’ s our Museum Guide: Continue reading

Categories: SIST, Statistical Inference as Severe Testing | 2 Comments

November: The leisurely tour of SIST continues

.

We continue our leisurely tour of Statistical Inference as Severe Testing [SIST] (Mayo 2018, CUP) with Excursion 3. This is based on my 5 seminars at the London School of Economics in 2020; I include slides and video for those who are interested. (use the comments for questions)

November’s Leisurely Tour: N-P and Fisherian Tests, Severe Testing 

Reading:

SIST: Excursion 3 Tour I (focus on pages up to p. 152): 3.13.23.3

Optional: Excursion 2 Tour II pp. 92-100 (Sections 2.4-2.7)

Quick refresher on means, variance, standard deviations, the Normal distribution, standard normal

 


 

Slides & Video Links for November (from my LSE Seminar)

Continue reading

Categories: 2024 Leisurely Cruise, significance tests, Statistical Inference as Severe Testing | Leave a comment

Has Statistics become corrupted? Philip Stark’s questions (and some questions about them) (ii)

.

In this post, I consider the questions posed for my (October 9) Neyman Seminar by Philip Stark, Distinguished Professor of Statistics at UC Berkeley. We didn’t directly deal with them during the panel discussion following my talk, and I find some of them a bit surprising. (Other panelist’s questions are here).

Philip Stark asks:

When and how did Statistics lose its way and become (largely) a mechanical way to bless results rather than a serious attempt to avoid fooling ourselves and others?

  1. To what extent have statisticians been complicit in the corruption of Statistics?
  2. Are there any clear turning points where things got noticeably worse?
  3. Is this a problem of statistics instruction ((a) teaching methodology rather than teaching how to answer scientific questions, (b) deemphasizing assumptions, (c) encouraging mechanical calculations and ignoring the interpretation of those calculations), (d) of disciplinary myopia (to publish in the literature of particular disciplines, you are required to use inappropriate methods), (e) of moral hazard (statisticians are often funded on scientific projects and have a strong incentive to do whatever it takes to bless “discoveries”), or something else?
  4. What can academic statisticians do to help get the train back on the tracks? Can you point to good examples?

These are important and highly provocative questions! To a large extent, Stark and other statisticians would be the ones to address them. As an outsider, and as a philosopher of science, I will merely analyze these questions. and in so doing raise some questions about them. That’s Part I of this post. In Part II, I will list some of Stark’s replies to #5 in his (2018) joint paper with Andrea Saltelli “Cargo-cult statistics and scientific crisis”. (The full paper is relevant for #1-4 as well.) Continue reading

Categories: Neyman Seminar, Stark | 16 Comments

Blog at WordPress.com.