Error Statistics Philosophy

Midnight With Birnbaum: Happy New Year 2025!

Posted on January 1, 2025 by Mayo

Remember that old Woody Allen movie, “Midnight in Paris,” where the main character (I forget who plays it, I saw it on a plane), a writer finishing a novel, steps into a cab that mysteriously picks him up at midnight and transports him back in time where he gets to run his work by such famous authors as Hemingway and Virginia Wolf? (It was a new movie when I began the blog in 2011.) He is wowed when his work earns their approval and he comes back each night in the same mysterious cab…Well, ever since I began this blog in 2011, I imagine being picked up in a mysterious taxi at midnight on New Year’s Eve, and lo and behold, find myself in the 1960s New York City, in the company of Allan Birnbaum who is is looking deeply contemplative, perhaps studying his 1962 paper…Birnbaum reveals some new and surprising twists this year! [i]

(The pic on the left is the only blurry image I have of the club I’m taken to.) It has been a decade since I published my article in Statistical Science (“On the Birnbaum Argument for the Strong Likelihood Principle”), which includes commentaries by A. P. David, Michael Evans, Martin and Liu, D. A. S. Fraser, Jan Hannig, and Jan Bjornstad. David Cox, who very sadly did in January 2022, is the one who encouraged me to write and publish it. Not only does the (Strong) Likelihood Principle (LP or SLP) remain at the heart of many of the criticisms of Neyman-Pearson (N-P) statistics and of error statistics in general, but a decade after my 2014 paper, it is more central than ever–even if it is often unrecognized.

OUR EXCHANGE: Continue reading →

Categories: Birnbaum, CHAT GPT, Likelihood Principle, Sir David Cox | 2 Comments

In case you want to binge read the (Strong) Likelihood Principle in 2025

Posted on December 31, 2024 by Mayo

I took a side trip to David Cox’s famous “weighing machine” example” a month ago, an example thought to have caused “a subtle earthquake” in foundations of statistics, because knew we’d be coming back to it at the end of December when we revisit the (strong) Likelihood Principle [SLP]. It’s been a decade since I published my Statistical Science article on this, Mayo (2014), which includes several commentators, but the issue is still mired in controversy. It’s generally dismissed as an annoying, mind-bending puzzle on which those in statistical foundations tend to hold absurdly strong opinions. Mostly it has been ignored. Yet I sense that 2025 is the year that people will return to it again, given some recent and soon to be published items. This post gives some background, and collects the essential links that you would need if you want to delve into it. Many readers know that each year I return to the issue on New Year’s Eve…. But that’s tomorrow.

By the way, this is not part of our lesurely tour of SIST. In fact, the argument is not even in SIST, although the SLP (or LP) arises a lot. But if you want to go off the beaten track with me to the SLP conundrum, here’s your opportunity. Continue reading →

Categories: 10 year memory lane, Likelihood Principle | Leave a comment

[3] December Leisurely Tour Meeting 3: SIST Excursion 3 Tour III

Posted on December 20, 2024 by Mayo

2024 Cruise

We are now at stop 3 on our December leisurely cruise through SIST: Excursion 3 Tour III. I am pasting the slides and video from this session during the LSE Research Seminars in 2020 (from which this cruise derives). (Remember it was early pandemic, and we weren’t so adept with zooming.) The Higgs discussion clarifies (and defends) a somewhat controversial interpretation of p-values. (If you’re interested in the Higgs discovery, there’s a lot more on this blog you can find with the search. Ben Recht recently blogged that the Higgs discovery did not take place. HEP physicists roundly responded. I would omit the section on “capability and severity” were I to write a second edition, while keeping the duality of tests and CIs. Share your remarks in the comments.

Continue reading →

Categories: 2024 Leisurely Cruise, confidence intervals and tests, LSE PH 500 | Leave a comment

December leisurely cruise “It’s the Methods, Stupid!” Excursion 3 Tour II (3.4-3.6)

Posted on December 8, 2024 by Mayo

2024 Cruise

Welcome to the December leisurely cruise:
Wherever we are sailing, assume that it’s warm. This is an overview of our first set of readings for December from my Statistical Inference as Severe Testing: How to get beyond the statistics wars (CUP 2018): [SIST]–Excursion 3 Tour II–(although I already snuck in one of the examples from 3.4, Cox’s weighing machine). This leisurely cruise is intended to take a whole month to cover one week of readings from my 2020 LSE Seminars, except for December and January which double up.

What do you think of “3.6 Hocus-Pocus: P-values Are Not Error probabilities, Are Not Even Frequentist”? This section refers to Jim Berger’s attempted unification of Jeffreys, Neyman and Fisher in 2003. The unification considers testing 2 simple hypotheses using a random sample from a Normal distribution, computing their two P-values, rejecting whichever gets a smaller P-value, and then computing its posterior probability, assuming each gets a prior of .5. This he calls the “Bayesian error probability”. The result violates what he calls the “frequentist principle”. According to Berger Neyman criticized p-values for violating the frequentist principle (SIST p. 186).

Some snapshots from Excursion 3 tour II.

Continue reading →

Categories: 2024 Leisurely Cruise | Leave a comment

66 Years of Cox’s (1958) Chestnut: Excerpt from Excursion 3 Tour II

Posted on November 30, 2024 by Mayo

2024 Cruise

We’re stopping briefly to consider one of the “chestnuts” in the exhibits of “chestnuts and howlers” in Excursion 3 (Tour II) of my book Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (SIST). It is now 66 years since Cox gave his famous weighing machine example in Sir David Cox (1958)[1]. It’s still relevant So, let’s go back to it, with an excerpt from SIST (pp. 170-173).

Exhibit (vi): Two Measuring Instruments of Different Precisions. Did you hear about the frequentist who, knowing she used a scale that’s right only half the time, claimed her method of weighing is right 75% of the time?

She says, “I flipped a coin to decide whether to use a scale that’s right 100% of the time, or one that’s right only half the time, so, overall, I’m right 75% of the time.” (She wants credit because she could have used a better scale, even knowing she used a lousy one.)

Basis for the joke: An N-P test bases error probability on all possible outcomes or measurements that could have occurred in repetitions, but did not. Continue reading →

Categories: 2024 Leisurely Cruise | 2 Comments

Call for reader replacements! First Look at N-P Methods as Severe Tests: Water plant accident [Exhibit (i) from Excursion 3]

Posted on November 24, 2024 by Mayo

November Cruise

Although the numbers used in the introductory example are fine, I’m unhappy with it and seek a replacement–ideally with the same or similar numbers. It is assumed that there is a concern both with inferring larger, as well as smaller, discrepancies than warranted. Actions taken if too high a temperature is inferred would be deleterious. But, given the presentation, the more “serious” error would be failing to report an increase, calling for H₀: μ ≥ 150 as the null. But the focus on one-sided positive discrepancies is used through the book, so I wanted to keep to that. I needed a one-sided test with a null value other than 0, and saw an example like this in a book. I think it was ecology. Of course, the example is purely for a simple. numerical illustration. Fortunately, the severity analysis gives the same interpretation of the data regardless of how the test and alternative hypotheses are specified. Still, I’m calling for reader replacements, a suitable reward to be ascertained. Continue reading →

Categories: 2024 Leisurely Cruise, severe tests, severity function, statistical tests, water plant accident | 1 Comment

Neyman-Pearson Tests: An Episode in Anglo-Polish Collaboration: (3.2)

Posted on November 18, 2024 by Mayo

Neyman & Pearson

November Cruise: 3.2

This second of November’s stops in the leisurely cruise of SIST aligns well with my recent Neyman Seminar at Berkeley. Egon Pearson’s description of the three steps in formulating tests is too rarely recognized today. Note especially the order of the steps. Share queries and thoughts in the comments.

3.2 N-P Tests: An Episode in Anglo-Polish Collaboration*

We proceed by setting up a specific hypothesis to test, H₀in Neyman’s and my terminology, the null hypothesis in R. A. Fisher’s . . . in choosing the test, we take into account alternatives to H₀which we believe possible or at any rate consider it most important to be on the look out for . . .Three steps in constructing the test may be defined:

Step 1. We must first specify the set of results . . .

Step 2. We then divide this set by a system of ordered boundaries . . .such that as we pass across one boundary and proceed to the next, we come to a class of results which makes us more and more inclined, on the information available, to reject the hypothesis tested in favour of alternatives which differ from it by increasing amounts.

Step 3. We then, if possible, associate with each contour level the chance that, if H₀ is true, a result will occur in random sampling lying beyond that level . . .

In our first papers [in 1928] we suggested that the likelihood ratio criterion, λ, was a very useful one . . . Thus Step 2 proceeded Step 3. In later papers [1933–1938] we started with a fixed value for the chance, ε, of Step 3 . . . However, although the mathematical procedure may put Step 3 before 2, we cannot put this into operation before we have decided, under Step 2, on the guiding principle to be used in choosing the contour system. That is why I have numbered the steps in this order. (Egon Pearson 1947, p. 173)

Continue reading →

Categories: 2024 Leisurely Cruise, E.S. Pearson, Neyman, statistical tests | Leave a comment

Where Are Fisher, Neyman, Pearson in 1919? Opening of Excursion 3, snippets from 3.1

Posted on November 14, 2024 by Mayo

November Cruise

This first excerpt for November is really just the preface to 3.1. Remember, our abbreviated cruise this fall is based on my LSE Seminars in 2020, and since there are only 5, I had to cut. So those seminars skipped 3.1 on the eclipse tests of GTR. But I want to share snippets from 3.1 with current readers, along with reflections in the comments. (I promise, I’ve even numbered them below)

Excursion 3 Statistical Tests and Scientific Inference

Tour I Ingenious and Severe Tests

[T]he impressive thing about [the 1919 tests of Einstein’s theory of gravity] is the risk involved in a prediction of this kind. If observation shows that the predicted effect is definitely absent, then the theory is simply refuted.The theory is incompatible with certain possible results of observation – in fact with results which everybody before Einstein would have expected. This is quite different from the situation I have previously described, [where] . . . it was practically impossible to describe any human behavior that might not be claimed to be a verification of these [psychological] theories. (Popper 1962, p. 36)

The 1919 eclipse experiments opened Popper’ s eyes to what made Einstein’ s theory so different from other revolutionary theories of the day: Einstein was prepared to subject his theory to risky tests.[1] Einstein was eager to galvanize scientists to test his theory of gravity, knowing the solar eclipse was coming up on May 29, 1919. Leading the expedition to test GTR was a perfect opportunity for Sir Arthur Eddington, a devout follower of Einstein as well as a devout Quaker and conscientious objector. Fearing “ a scandal if one of its young stars went to jail as a conscientious objector,” officials at Cambridge argued that Eddington couldn’ t very well be allowed to go off to war when the country needed him to prepare the journey to test Einstein’ s predicted light deflection (Kaku 2005, p. 113).

The museum ramps up from Popper through a gallery on “ Data Analysis in the 1919 Eclipse” (Section 3.1) which then leads to the main gallery on origins of statistical tests (Section 3.2). Here’ s our Museum Guide: Continue reading →

Categories: SIST, Statistical Inference as Severe Testing | 2 Comments

November: The leisurely tour of SIST continues

Posted on November 9, 2024 by Mayo

We continue our leisurely tour of Statistical Inference as Severe Testing [SIST] (Mayo 2018, CUP) with Excursion 3. This is based on my 5 seminars at the London School of Economics in 2020; I include slides and video for those who are interested. (use the comments for questions)

November’s Leisurely Tour: N-P and Fisherian Tests, Severe Testing

Reading:

SIST: Excursion 3 Tour I (focus on pages up to p. 152): 3.1, 3.2, 3.3

Optional: Excursion 2 Tour II pp. 92-100 (Sections 2.4-2.7)

Quick refresher on means, variance, standard deviations, the Normal distribution, standard normal

Slides & Video Links for November (from my LSE Seminar)

Continue reading →

Categories: 2024 Leisurely Cruise, significance tests, Statistical Inference as Severe Testing | Leave a comment

Has Statistics become corrupted? Philip Stark’s questions (and some questions about them) (ii)

Posted on November 6, 2024 by Mayo

In this post, I consider the questions posed for my (October 9) Neyman Seminar by Philip Stark, Distinguished Professor of Statistics at UC Berkeley. We didn’t directly deal with them during the panel discussion following my talk, and I find some of them a bit surprising. (Other panelist’s questions are here).

Philip Stark asks:

When and how did Statistics lose its way and become (largely) a mechanical way to bless results rather than a serious attempt to avoid fooling ourselves and others?

To what extent have statisticians been complicit in the corruption of Statistics?
Are there any clear turning points where things got noticeably worse?
Is this a problem of statistics instruction ((a) teaching methodology rather than teaching how to answer scientific questions, (b) deemphasizing assumptions, (c) encouraging mechanical calculations and ignoring the interpretation of those calculations), (d) of disciplinary myopia (to publish in the literature of particular disciplines, you are required to use inappropriate methods), (e) of moral hazard (statisticians are often funded on scientific projects and have a strong incentive to do whatever it takes to bless “discoveries”), or something else?
What can academic statisticians do to help get the train back on the tracks? Can you point to good examples?

These are important and highly provocative questions! To a large extent, Stark and other statisticians would be the ones to address them. As an outsider, and as a philosopher of science, I will merely analyze these questions. and in so doing raise some questions about them. That’s Part I of this post. In Part II, I will list some of Stark’s replies to #5 in his (2018) joint paper with Andrea Saltelli “Cargo-cult statistics and scientific crisis”. (The full paper is relevant for #1-4 as well.) Continue reading →

Categories: Neyman Seminar, Stark | 16 Comments

Excursion 1 Tour II (4th stop): The Law of Likelihood and Error Statistics (1.4)

Posted on October 28, 2024 by Mayo

Ship Statinfasst

We are starting on Tour II of Excursion 1 (4th stop). The 3rd stop is in an earlier blog post. As I promised, this cruise of SIST is leisurely. I have not yet shared new reflections in the comments–but I will!

Where YOU are in the journey: Continue reading →

Categories: Bayesian/frequentist, Likelihood Principle, LSE PH 500 | Leave a comment

Panel Discussion Questions from my Neyman Lecture: “Severity as a basic concept in philosophy of statistics”

Posted on October 27, 2024 by Mayo

Giordano, Snow, Yu, Stark, Recht

My Neyman Seminar in the Statistics Department at Berkeley was followed by a lively panel discussion including 4 Berkeley faculty, orchestrated by Ryan Giordano (Dept of Statistics):

Xueyin Snow Zhang (Dept. of Philosophy)
Bin Yu (Depts. of Statistics, Electrical Engineering and Computer Sciences)
Philip Stark (Dept. of Statistics)
Ben Recht (Dept. of Electrical Engineering and Computer Sciences)

Continue reading →

Categories: Berkeley Neyman Seminar | 4 Comments

Response to Ben Recht’s post (“What is Statistics’ Purpose?”) on my Neyman seminar (ii)

Posted on October 22, 2024 by Mayo

There was a very valuable panel discussion after my October 9 Neyman Seminar in the Statistics Department at UC Berkeley. I want to respond to many of the questions put forward by the participants (Ben Recht, Philip Stark, Bin Yu, Snow Zhang) that we did not address during that panel. Slides from my presentation, “Severity as a basic concept of philosophy of statistics” are at the end of this post (but with none of the animations). I begin in this post by responding to Ben Recht, a professor of Artificial Intelligence and Computer Science at Berkeley, and his recent blogpost, What is Statistics’ Purpose? On severe testing, regulation, and butter passing, on my talk. I will consider: (1) A complex or leading question; (2) Why I chose to focus about Neyman’s philosophy of statistics and (3) What the “100 years of fighting and browbeating” were/are all about. Continue reading →

Categories: affirming the consequent, Ben Recht, Neyman, P-values, Severity, statistical significance tests, statistics wars | 10 Comments

Excursion 1 Tour I (3rd stop): The Current State of Play in Statistical Foundations: A View From a Hot-Air Balloon (1.3)

Posted on October 16, 2024 by Mayo

Third Stop

Readers: With this third stop we’ve covered Tour 1 of Excursion 1. My slides from the first LSE meeting in 2020 which dealt with elements of Excursion 1 can be found at the end of this post. There’s also a video giving an overall intro to SIST, Excursion 1. It’s noteworthy to consider just how much things seem to have changed in just the past few years. Or have they? What would the view from the hot-air balloon look like now? I will try to address this in the comments.

Continue reading →

Categories: 2024 Leisurely Cruise, Statistical Inference as Severe Testing | Leave a comment

Excursion 1 Tour I (2nd Stop): Probabilism, Performance, and Probativeness (1.2)

Posted on October 12, 2024 by Mayo

Readers: I gave the Neyman Seminar at Berkeley last Wednesday, October 9, and had been so busy preparing it that I did not update my leisurely cruise for October. This is the second stop. I will shortly post remarks on the the panel discussion that followed my Neyman talk (with panelists, Ben Recht, Philip Stark, Bin Yu, and Snow Zhang), which was quite illuminating.

“I shall be concerned with the foundations of the subject. But in case it should be thought that this means I am not here strongly concerned with practical applications, let me say right away that confusion about the foundations of the subject is responsible, in my opinion, for much of the misuse of the statistics that one meets in ﬁelds of application such as medicine, psychology, sociology, economics, and so forth”. (George Barnard 1985, p. 2)

Continue reading →

Categories: Error Statistics | Leave a comment

The leisurely cruise begins: Excerpt from Excursion 1 Tour 1 of Statistical Inference as Severe Testing (SIST)

Posted on September 30, 2024 by Mayo

Ship Statinfasst

Excerpt from excursion 1 Tour I: Beyond Probabilism and Performance: Severity Requirement (1.1)

NOTE: The following is an excerpt from my existing book: Statistical Inference as Severe Testing: How to get beyond the statistics wars (CUP, 2018). For any new reflections or corrections, I will use the comments. The initial announcement is here.

I’m talking about a speciﬁc, extra type of integrity that is [beyond] not lying, but bending over backwards to show how you’re maybe wrong, that you ought to have when acting as a scientist. (Feynman 1974/1985, p. 387)

Continue reading →

Categories: Error Statistics | Leave a comment

Leisurely cruise through Statistical Inference as Severe Testing: First Announcement

Posted on September 20, 2024 by Mayo

Ship Statinfasst

We’re embarking on a leisurely cruise through the highlights of Statistical Inference as Severe Testing [SIST]: How to Get Beyond the Statistics Wars (CUP 2018) this fall (Oct-Jan), following the 5 seminars I led for a 2020 London School of Economics (LSE) Graduate Research Seminar. It was run entirely online due to Covid (as were the workshops that followed). In this new, relaxed (self-paced) journey, excursions that had been covered in a week, will be spread out over a month [i] and I’ll be posting abbreviated excerpts on this blog a few times a month. Look for the posts marked with the picture of ship StatInfAsSt. [ii] Continue reading →

Categories: 2024 Leisurely Cruise, Announcement | Leave a comment

An exchange between A. Gelman and D. Mayo on abandoning statistical significance: 5 years ago

Posted on September 11, 2024 by Mayo

Below is an email exchange that Andrew Gelman posted on this day 5 years ago on his blog, Statistical Modeling, Causal Inference, and Social Science. (You can find the original exchange, with its 130 comments, here.) Note: “Me” refers to Gelman. I will share my current reflections in the comments.

Exchange with Deborah Mayo on abandoning statistical significance

Posted on September 11, 2019 9:52 AM by Andrew

Continue reading →

Categories: 5-year memory lane, abandon statistical significance, Gelman blogs an exchange with Mayo | 4 Comments

Georgi Georgiev (Guest Post): “The frequentist vs Bayesian split in online experimentation before and after the ‘abandon statistical significance’ call”

Posted on August 31, 2024 by Mayo

Georgi Georgiev

Author of Statistical methods in online A/B testing
Founder of Analytics-Toolkit.com
Statistics instructor at CXL Institute

In online experimentation, a.k.a. online A/B testing, one is primarily interested in estimating if and how different user experiences affect key business metrics such as average revenue per user. A trivial example would be to determine if a given change to the purchase flow of an e-commerce website is positive or negative as measured by average revenue per user, and by how much. An online controlled experiment would be conducted with actual users assigned randomly to either the currently implemented experience or the changed one. Continue reading →

Categories: A/B testing, abandon statistical significance, optional stopping | Tags: 993300 | 25 Comments

Don’t divorce statistical inference from “statistical thinking”: some exchanges

Posted on August 26, 2024 by Mayo

A topic that came up in some comments recently reflects a recent tendency to divorce statistical inference (bad) from statistical thinking (good), and it deserves the spotlight of a post. I always alert authors of papers that come up on this blog, inviting them to comment, and one from Christopher Tong (reacting to a comment on Ron Kenett) concerns this dichotomy.

Response by Christopher Tong to D. Mayo’s July 14 comment

TONG: In responding to Prof. Kenett, Prof. Mayo states: “we should reject the supposed dichotomy between ‘statistical method and statistical thinking’ which unfortunately gives rise to such titles as ‘Statistical inference enables bad science, statistical thinking enables good science,’ in the special TAS 2019 issue. This is nonsense.” [Mayo July 14 comment here.] Continue reading →

Categories: statistical inference vs statistical thinking, statistical significance tests, Wasserstein et al 2019 | 11 Comments

Reading:

Slides & Video Links for November (from my LSE Seminar)

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.