Monthly Archives: December 2025

Midnight With Birnbaum: Happy New Year 2026!

.

Anyone here remember that old Woody Allen movie, “Midnight in Paris,” where the main character (I forget who plays it, I saw it on a plane), a writer finishing a novel, steps into a cab that mysteriously picks him up at midnight and transports him back in time where he gets to run his work by such famous authors as Hemingway and Virginia Wolf?  (It was a new movie when I began the blog in 2011.) He is wowed when his work earns their approval and he comes back each night in the same mysterious cab…Well, ever since I began this blog in 2011, I imagine being picked up in a mysterious taxi at midnight on New Year’s Eve, and lo and behold, find myself in the 1960s New York City, in the company of Allan Birnbaum who is is looking deeply contemplative, perhaps studying his 1962 paper…Birnbaum reveals some new and surprising twists this year! [i] 

(The pic on the left is the only blurry image I have of the club I’m taken to.) It has been a decade since  I published my article in Statistical Science (“On the Birnbaum Argument for the Strong Likelihood Principle”), which includes  commentaries by A. P. David, Michael Evans, Martin and Liu, D. A. S. Fraser, Jan Hannig, and Jan Bjornstad. David Cox, who very sadly did in January 2022, is the one who encouraged me to write and publish it. Not only does the (Strong) Likelihood Principle (LP or SLP) remain at the heart of many of the criticisms of Neyman-Pearson (N-P) statistics and of error statistics in general, but a decade after my 2014 paper, it is more central than ever–even if it is often unrecognized.

OUR EXCHANGE:

ERROR STATISTICIAN: It’s wonderful to meet you Professor Birnbaum; I’ve always been extremely impressed with the important impact your work has had on philosophical foundations of statistics.  I happen to have published on your famous argument about the likelihood principle (LP).  (whispers: I can’t believe this!) Continue reading

Categories: Birnbaum, CHAT GPT, Likelihood Principle, Sir David Cox | Leave a comment

For those who want to binge read the (Strong) Likelihood Principle in 2025

.

David Cox’s famous “weighing machine” example” from my last post is thought to have caused “a subtle earthquake” in foundations of statistics. It’s been 11 years since I published my Statistical Science article on this, Mayo (2014), which includes several commentators, but the issue is still mired in controversy. It’s generally dismissed as an annoying, mind-bending puzzle on which those in statistical foundations tend to hold absurdly strong opinions. Mostly it has been ignored. Yet I sense that 2026 is the year that people will return to it again. It’s at least touched upon in Roderick Little’s new book (pic below). This post gives some background, and collects the essential links that you would need if you want to delve into it. Many readers know that each year I return to the issue on New Year’s Eve…. But that’s tomorrow.

By the way, this is not part of our lesurely tour of SIST. In fact, the argument is not even in SIST, although the SLP (or LP) arises a lot. But if you want to go off the beaten track with me to the SLP conundrum, here’s your opportunity. Continue reading

Categories: 11 years ago, Likelihood Principle | Leave a comment

67 Years of Cox’s (1958) Chestnut: Excerpt from Excursion 3 Tour II

2025-26 Cruise

.

We’re stopping to consider one of the “chestnuts” in the exhibits of “chestnuts and howlers” in Excursion 3 (Tour II) of Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (SIST 2018). It is now 67 years since Cox gave his famous weighing machine example in Sir David Cox (1958)[1]. It will play a vital role in our discussion of the (strong) Likelihood Principle later this week. The excerpt is from SIST (pp. 170-173).

Exhibit (vi): Two Measuring Instruments of Different Precisions. Did you hear about the frequentist who, knowing she used a scale that’s right only half the time, claimed her method of weighing is right 75% of the time? 

She says, “I flipped a coin to decide whether to use a scale that’s right 100% of the time, or one that’s right only half the time, so, overall, I’m right 75% of the time.” (She wants credit because she could have used a better scale, even knowing she used a lousy one.)

Basis for the joke: An N-P test bases error probability on all possible outcomes or measurements that could have occurred in repetitions, but did not. Continue reading

Categories: 2025 leisurely cruise, Birnbaum, Likelihood Principle | Leave a comment

(DEC #2) December Leisurely Tour Meeting 3: SIST Excursion 3 Tour III

2025-26 Cruise

We are now at the second stop on our December leisurely cruise through SIST: Excursion 3 Tour III. I am pasting the slides and video from this session during the LSE Research Seminars in 2020 (from which this cruise derives). (Remember it was early pandemic, and we weren’t so adept with zooming.)  The Higgs discussion clarifies (and defends) a somewhat controversial interpretation of p-values. (If you’re interested in the Higgs discovery, there’s a lot more on this blog you can find with the search. I am not sure if I would include the section on “capability and severity” were I to write a second edition, though I would keep the duality of tests and CIs. My goal was to expose a fallacy that is even more common nowadays, but I would have placed a revised version later in the book. Share your remarks in the comments.

.

 

 

 

 

 

III. Deeper Concepts: Confidence Intervals and Tests: Higgs’ Discovery: Continue reading

Categories: 2025 leisurely cruise, confidence intervals and tests | Leave a comment

December leisurely cruise “It’s the Methods, Stupid!” Excursion 3 Tour II (3.4-3.6)

2025-26 Cruise

Welcome to the December leisurely cruise:
Wherever we are sailing, assume that it’s warm, warm, warm (not like today in NYC). This is an overview of our first set of readings for December from my Statistical Inference as Severe Testing: How to get beyond the statistics wars (CUP 2018): [SIST]–Excursion 3 Tour II. This leisurely cruise, participants know, is intended to take a whole month to cover one week of readings from my 2020 LSE Seminars, except for December and January which double up. 

What do you think of  “3.6 Hocus-Pocus: P-values Are Not Error probabilities, Are Not Even Frequentist”? This section refers to Jim Berger’s famous attempted unification of Jeffreys, Neyman and Fisher in 2003. The unification considers testing 2 simple hypotheses using a random sample from a Normal distribution, computing their two P-values, rejecting whichever gets a smaller P-value, and then computing its posterior probability, assuming each gets a prior of .5. This becomes what he calls the “Bayesian error probability” upon which he defines “the frequentist principle”. On Berger’s reading of an important paper* by Neyman (1977), Neyman criticized p-values for violating the frequentist principle (SIST p. 186). *The paper is “frequentist probability and frequentist statistics”. Remember that links to readings outside SIST are at the Captains biblio on the top left of the blog. Share your thoughts in the comments.

Some snapshots from Excursion 3 tour II.

Continue reading

Categories: 2025 leisurely cruise | Leave a comment

Modest replication probabilities of p-values–desirable, not regrettable: a note from Stephen Senn

.

You will often hear—especially in discussions about the “replication crisis”—that statistical significance tests exaggerate evidence. Significance testing, we hear, inflates effect sizes, inflates power, inflates the probability of a real effect, or inflates the probability of replication, and thereby misleads scientists.

If you look closely, you’ll find the charges are based on concepts and philosophical frameworks foreign to both Fisherian and Neyman–Pearson hypothesis testing. Nearly all have been discussed on this blog or in SIST (Mayo 2018), but new variations have cropped up. The emphasis that some are now placing on how biased selection effects invalidate error probabilities is welcome, but I say that the recommendations for reinterpreting quantities such as p-values and power introduce radical distortions of error statistical inferences. Before diving into the modern incarnations of the charges it’s worth recalling Stephen Senn’s response to Stephen Goodman’s attempt to convert p-values into replication probabilities nearly 20 years ago (“A Comment on Replication, P-values and Evidence,” Statistics in Medicine). I first blogged it in 2012, here. Below I am pasting some excerpts from Senn’s letter (but readers interested in the topic should look at all of it), because Senn’s clarity cuts straight through many of today’s misunderstandings. 

.

Continue reading

Categories: 13 years ago, p-values exaggerate, replication research, S. Senn | Tags: , , , | 8 Comments

Blog at WordPress.com.