**Today is Jerzy Neyman’s birthday (April 16, 1894 – August 5, 1981). **I’m reposting a link to a quirky, but fascinating, paper of his that explains one of the most misunderstood of his positions–what he was opposed to in opposing the “inferential theory”. The paper, fro 60 years ago,Neyman, J. (1962), ‘Two Breakthroughs in the Theory of Statistical Decision Making‘ [i] It’s chock full of ideas and arguments. “In the present paper” he tells us, “the term ‘inferential theory’…will be used to describe the attempts to solve the Bayes’ problem with a reference to confidence, beliefs, etc., through some supplementation …either a substitute *a priori* distribution [exemplified by the so called principle of insufficient reason] or a new measure of uncertainty” such as Fisher’s fiducial probability. It arises on p. 391 of Excursion 5 Tour III of Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP). Here’s a link to the proofs of that entire tour. If you hear Neyman rejecting “inferential accounts,” you have to understand it in this very specific way: he’s rejecting “new measures of confidence or diffidence”. Here he alludes to them as “easy ways out”. He is not rejecting statistical inference in favor of behavioral performance as is typically thought. It’s amazing how an idiosyncratic use of a word 60 years ago can cause major rumblings decades later. Neyman always distinguished his error statistical performance conception from Bayesian and Fiducial probabilisms [ii]. The surprising twist here is semantical and the culprit is none other than…Allan Birnbaum. Yet Birnbaum gets short shrift, and no mention is made of our favorite “breakthrough” (or did I miss it?). You can find quite a lot on this blog searching Birnbaum. Continue reading

# Bayesian/frequentist

## Happy Birthday Neyman: What was Neyman opposing when he opposed the ‘Inferential’ Probabilists? Your weekend Phil Stat reading

## Bickel’s defense of significance testing on the basis of Bayesian model checking

In my last post, I said I’d come back to a (2021) article by David Bickel, “Null Hypothesis Significance Testing Defended and Calibrated by Bayesian Model Checking” in *The American Statistician.* His abstract begins as follows:

Significance testing is often criticized because

p-values can be low even though posterior probabilities of the null hypothesis are not low according to some Bayesian models. Those models, however, would assign low prior probabilities to the observation that thep-value is sufficiently low. That conflict between the models and the data may indicate that the models needs revision. Indeed, if thep-value is sufficiently small while the posterior probability according to a model is insufficiently small, then the model will fail a model check….(fromBickel2021)

## P-values disagree with posteriors? Problem is your priors, says R.A. Fisher

How often do you hear P-values criticized for “exaggerating” the evidence against a null hypothesis? If your experience is like mine, the answer is ‘all the time’, and in fact, the charge is often taken as one of the strongest cards in the anti-statistical significance playbook. The argument boils down to the fact that the P-value accorded to a point null *H*_{0 }can be small while its Bayesian posterior probability high–provided a high enough prior is accorded to *H*_{0}. But why suppose P-values should match Bayesian posteriors? And what justifies the high (or “spike”) prior to a point null? While I discuss this criticism at considerable length in *Statistical Inference as Severe Testing: How to get beyond the statistics wars* (CUP, 2018), I did not quote an intriguing response by R.A. Fisher to disagreements between P-values and posteriors’s (in *Statistical Methods and Scientific Inference, Fisher 1956);* namely, that such a prior probability assignment would itself be rejected by the observed small P-value–if the prior were itself regarded as a hypothesis to test. Or so he says. I did mention this response by Fisher in an encyclopedia article from way back in 2006 on “philosophy of statistics”: Continue reading

## Memory Lane (4 years ago): Why significance testers should reject the argument to “redefine statistical significance”, even if they want to lower the p-value*

An argument that assumes the very thing that was to have been argued for is guilty of *begging the question*; signing on to an argument whose conclusion you favor even though you cannot defend its premises is to argue *unsoundly*, and in bad faith. When a whirlpool of “reforms” subliminally alter the nature and goals of a method, falling into these sins can be quite inadvertent. Start with a simple point on defining the power of a statistical test. Continue reading

## Happy Birthday Neyman: What was Neyman opposing when he opposed the ‘Inferential’ Probabilists?

**Today is Jerzy Neyman’s birthday (April 16, 1894 – August 5, 1981). **I’m posting a link to a quirky paper of his that explains one of the most misunderstood of his positions–what he was opposed to in opposing the “inferential theory”. The paper is Neyman, J. (1962), ‘Two Breakthroughs in the Theory of Statistical Decision Making‘ [i] It’s chock full of ideas and arguments. “In the present paper” he tells us, “the term ‘inferential theory’…will be used to describe the attempts to solve the Bayes’ problem with a reference to confidence, beliefs, etc., through some supplementation …either a substitute *a priori* distribution [exemplified by the so called principle of insufficient reason] or a new measure of uncertainty” such as Fisher’s fiducial probability. It arises on p. 391 of Excursion 5 Tour III of Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP). Here’s a link to the proofs of that entire tour. If you hear Neyman rejecting “inferential accounts” you have to understand it in this very specific way: he’s rejecting “new measures of confidence or diffidence”. Here he alludes to them as “easy ways out”. He is not rejecting statistical inference in favor of behavioral performance as typically thought. Neyman always distinguished his error statistical performance conception from Bayesian and Fiducial probabilisms [ii]. The surprising twist here is semantical and the culprit is none other than…Allan Birnbaum. Yet Birnbaum gets short shrift, and no mention is made of our favorite “breakthrough” (or did I miss it?). You can find quite a lot on this blog searching Birnbaum. Continue reading

## My Responses (at the P-value debate)

How did I respond to those 7 burning questions at last week’s (“P-Value”) Statistics Debate? Here’s a fairly close transcript of my (a) general answer, and (b) final remark, for each question–without the in-between responses to Jim and David. The exception is question 5 on Bayes factors, which naturally included Jim in my general answer.

The questions with the most important consequences, I think, are questions 3 and 5. I’ll explain why I say this in the comments. Please share your thoughts. Continue reading

## P-Value Statements and Their Unintended(?) Consequences: The June 2019 ASA President’s Corner (b)

I never met Karen Kafadar, the 2019 President of the American Statistical Association (ASA), but the other day I wrote to her in response to a call in her extremely interesting June 2019 President’s Corner: “Statistics and Unintended Consequences“:

- “I welcome your suggestions for how we can communicate the importance of statistical inference and the proper interpretation of p-values to our scientific partners and science journal editors in a way they will understand and appreciate and can use with confidence and comfort—before they change their policies and abandon statistics altogether.”

I only recently came across her call, and I will share my letter below. First, here are some excerpts from her June President’s Corner (her December report is due any day). Continue reading

## Neyman vs the ‘Inferential’ Probabilists

**We celebrated Jerzy Neyman’s Birthday (April 16, 1894) **last night in our seminar: here’s a pic of the cake. My entry today is a brief excerpt and a link to a paper of his that we haven’t discussed much on this blog: Neyman, J. (1962), ‘Two Breakthroughs in the Theory of Statistical Decision Making‘ [i] It’s chock full of ideas and arguments, but the one that interests me at the moment is Neyman’s conception of “his breakthrough”, in relation to a certain concept of “inference”. “In the present paper” he tells us, “the term ‘inferential theory’…will be used to describe the attempts to solve the Bayes’ problem with a reference to confidence, beliefs, etc., through some supplementation …either a substitute *a priori* distribution [exemplified by the so called principle of insufficient reason] or a new measure of uncertainty” such as Fisher’s fiducial probability. So if you hear Neyman rejecting “inferential accounts” you have to understand it in this very specific way: he’s rejecting “new measures of confidence or diffidence”. Here he alludes to them as “easy ways out”. Now Neyman always distinguishes his error statistical performance conception from Bayesian and Fiducial probabilisms [ii]. The surprising twist here is semantical and the culprit is none other than…Allan Birnbaum. Yet Birnbaum gets short shrift, and no mention is made of our favorite “breakthrough” (or did I miss it?).

Note: In this article,”attacks” on various statistical “fronts” refers to ways of attacking problems in one or another statistical research program.

**HAPPY BIRTHDAY WEEK FOR NEYMAN!** Continue reading

## there’s a man at the wheel in your brain & he’s telling you what you’re allowed to say (not probability, not likelihood)

It seems like every week something of excitement in statistics comes down the pike. Last week I was contacted by Richard Harris (and 2 others) about the recommendation to stop saying the data reach “significance level p” but rather simply say

“the p-value is p”.

(For links, see my previous post.) Friday, he wrote to ask if I would comment on a proposed restriction (?) on saying a test had high power! I agreed that we shouldn’t say a test has high power, but only that it has a high power to detect a specific alternative, but I wasn’t aware of any rulings from those in power on power. He explained it was an upshot of a reexamination by a joint group of the boards of statistical associations in the U.S. and UK. of the full panoply of statistical terms. Something like that. I agreed to speak with him yesterday. He emailed me the proposed ruling on power: Continue reading

## Neyman vs the ‘Inferential’ Probabilists continued (a)

**Today is Jerzy Neyman’s Birthday (April 16, 1894 – August 5, 1981). ** I am posting a brief excerpt and a link to a paper of his that I hadn’t posted before: Neyman, J. (1962), ‘Two Breakthroughs in the Theory of Statistical Decision Making‘ [i] It’s chock full of ideas and arguments, but the one that interests me at the moment is Neyman’s conception of “his breakthrough”, in relation to a certain concept of “inference”. “In the present paper” he tells us, “the term ‘inferential theory’…will be used to describe the attempts to solve the Bayes’ problem with a reference to confidence, beliefs, etc., through some supplementation …either a substitute *a priori* distribution [exemplified by the so called principle of insufficient reason] or a new measure of uncertainty” such as Fisher’s fiducial probability. Now Neyman always distinguishes his error statistical performance conception from Bayesian and Fiducial probabilisms [ii]. The surprising twist here is semantical and the culprit is none other than…Allan Birnbaum. Yet Birnbaum gets short shrift, and no mention is made of our favorite “breakthrough” (or did I miss it?). [iii] I’ll explain in later stages of this post & in comments…(so please check back); I don’t want to miss the start of the birthday party in honor of Neyman, and it’s already 8:30 p.m in Berkeley!

Note: In this article,”attacks” on various statistical “fronts” refers to ways of attacking problems in one or another statistical research program. **HAPPY BIRTHDAY NEYMAN!** Continue reading

## Why significance testers should reject the argument to “redefine statistical significance”, even if they want to lower the p-value*

An argument that assumes the very thing that was to have been argued for is guilty of *begging the question*; signing on to an argument whose conclusion you favor even though you cannot defend its premises is to argue *unsoundly*, and in bad faith. When a whirlpool of “reforms” subliminally alter the nature and goals of a method, falling into these sins can be quite inadvertent. Start with a simple point on defining the power of a statistical test.

**I. Redefine Power?**

Given that power is one of the most confused concepts from Neyman-Pearson (N-P) frequentist testing, it’s troubling that in “Redefine Statistical Significance”, power gets redefined too. “Power,” we’re told, is a Bayes Factor BF “obtained by defining *H*_{1} as putting ½ probability on μ = ± m for the value of m that gives 75% power for the test of size α = 0.05. This *H*_{1} represents an effect size typical of that which is implicitly assumed by researchers during experimental design.” (material under Figure 1). Continue reading

## New venues for the statistics wars

I was part of something called “a brains blog roundtable” on the business of p-values earlier this week–I’m glad to see philosophers getting involved.

Next week I’ll be in a session that I think is intended to explain what’s right about P-values at an ASA Symposium on Statistical Inference : “A World Beyond p < .05”. Continue reading

## Peircean Induction and the Error-Correcting Thesis

Sunday, September 10, was C.S. Peirce’s birthday. He’s one of my heroes. He’s a treasure chest on essentially any topic, and anticipated quite a lot in statistics and logic. (As Stephen Stigler (2016) notes, he’s to be credited with articulating and appling randomization [1].) I always find something that feels astoundingly new, even rereading him. He’s been a great resource as I complete my book, *Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars* (CUP, 2018) [2]. I’m reblogging the main sections of a (2005) paper of mine. It’s written for a very general philosophical audience; the statistical parts are very informal. I first posted it in 2013. *Happy **(belated)** Birthday Peirce*.

**Peircean Induction and the Error-Correcting Thesis**

Deborah G. Mayo

*Transactions of the Charles S. Peirce Society: A Quarterly Journal in American Philosophy*, Volume 41, Number 2, 2005, pp. 299-319

Peirce’s philosophy of inductive inference in science is based on the idea that what permits us to make progress in science, what allows our knowledge to grow, is the fact that science uses methods that are self-correcting or error-correcting:

Induction is the experimental testing of a theory. The justification of it is that, although the conclusion at any stage of the investigation may be more or less erroneous, yet the further application of the same method must correct the error. (5.145)

Inductive methods—understood as methods of experimental testing—are justified to the extent that they are error-correcting methods. We may call this Peirce’s error-correcting or self-correcting thesis (SCT): Continue reading

## Can You Change Your Bayesian Prior? The one post whose comments (some of them) will appear in my new book

**I blogged this exactly 2 years ago here, seeking insight for my new book (Mayo 2017). Over 100 (rather varied) interesting comments ensued. This is the first time I’m incorporating blog comments into published work. You might be interested to follow the nooks and crannies from back then, or add a new comment to this.**

This is one of the questions high on the “To Do” list I’ve been keeping for this blog. The question grew out of discussions of “updating and downdating” in relation to papers by Stephen Senn (2011) and Andrew Gelman (2011) in* Rationality, Markets, and Morals.[i]*

“As an exercise in mathematics [computing a posterior based on the client’s prior probabilities] is not superior to showing the client the data, eliciting a posterior distribution and then calculating the prior distribution; as an exercise in inference Bayesian updating does not appear to have greater claims than ‘downdating’.” (Senn, 2011, p. 59)

“If you could really express your uncertainty as a prior distribution, then you could just as well observe data and directly write your subjective posterior distribution, and there would be no need for statistical analysis at all.” (Gelman, 2011, p. 77)

But if uncertainty is not expressible as a prior, then a major lynchpin for Bayesian updating seems questionable. If you can go from the posterior to the prior, on the other hand, perhaps it can also lead you to come back and change it.

**Is it legitimate to change one’s prior based on the data?** Continue reading

## Frequentstein’s Bride: What’s wrong with using (1 – β)/α as a measure of evidence against the null?

ONE YEAR AGO: …and growing more relevant all the time. Rather than leak any of my new book*, I reblog some earlier posts, even if they’re a bit scruffy. This was first blogged here (with a slightly different title). It’s married to posts on “the P-values overstate the evidence against the null fallacy”, such as this, and is wedded to this one on “How to Tell What’s True About Power if You’re Practicing within the Frequentist Tribe”.

In their “Comment: A Simple Alternative to p-values,” (on the ASA P-value document), Benjamin and Berger (2016) recommend researchers report a pre-data Rejection Ratio:

It is the probability of rejection when the alternative hypothesis is true, divided by the probability of rejection when the null hypothesis is true, i.e., the ratio of the power of the experiment to the Type I error of the experiment. The rejection ratio has a straightforward interpretation as quantifying the strength of evidence about the alternative hypothesis relative to the null hypothesis conveyed by the experimental result being statistically significant. (Benjamin and Berger 2016, p. 1)

## “Fusion-Confusion?” My Discussion of Nancy Reid: “BFF Four- Are we Converging?”

Here are the slides from my discussion of Nancy Reid today at BFF4: The Fourth Bayesian, Fiducial, and Frequentist Workshop: May 1-3, 2017 (hosted by Harvard University)

## S. Senn: “Automatic for the people? Not quite” (Guest post)

**Stephen Senn**

* Head of Competence Center for Methodology and Statistics (CCMS)
Luxembourg Institute of Health
Twitter @stephensenn
*

**Automatic for the people? Not quite**

What caught my eye was the estimable (in its non-statistical meaning) Richard Lehman tweeting about the equally estimable John Ioannidis. For those who don’t know them, the former is a veteran blogger who keeps a very cool and shrewd eye on the latest medical ‘breakthroughs’ and the latter a serial iconoclast of idols of scientific method. This is what Lehman wrote

Ioannidis hits 8 on the Richter scale: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0173184 … Bayes factors consistently quantify strength of evidence, p is valueless.

Since Ioannidis works at Stanford, which is located in the San Francisco Bay Area, he has every right to be interested in earthquakes but on looking up the paper in question, a faint tremor is the best that I can afford it. I shall now try and explain why, but before I do, it is only fair that I acknowledge the very generous, prompt and extensive help I have been given to understand the paper[1] in question by its two authors Don van Ravenzwaaij and Ioannidis himself. Continue reading

## The Fourth Bayesian, Fiducial and Frequentist Workshop (BFF4): Harvard U

**May 1-3, 2017**

**Hilles Event Hall, ****59 Shepard St. MA**

The Department of Statistics is pleased to announce the **4th Bayesian, Fiducial and Frequentist Workshop (BFF4**), to be held on May 1-3, 2017 at Harvard University. The BFF workshop series celebrates foundational thinking in statistics and inference under uncertainty. The three-day event will present talks, discussions and panels that feature statisticians and philosophers whose research interests synergize at the interface of their respective disciplines. Confirmed featured speakers include Sir David Cox and Stephen Stigler.

** Featured Speakers and Discussants**:

*Arthur Dempster (Harvard); Cynthia Dwork (Harvard); Andrew Gelman (Columbia); Ned Hall (Harvard); Deborah Mayo (Virginia Tech); Nancy Reid (Toronto); Susanna Rinard (Harvard); Christian Robert (Paris-Dauphine/Warwick); Teddy Seidenfeld (CMU); Glenn Shafer (Rutgers); Stephen Senn (LIH); Stephen Stigler (Chicago); Sandy Zabell (Northwestern)*

** Invited Speakers and Panelists**:

*Jim Berger (Duke); Emery Brown (MIT/MGH); Larry Brown (Wharton); David Cox (Oxford; remote participation); Paul Edlefsen (Hutch); Don Fraser (Toronto); Ruobin Gong (Harvard); Jan Hannig (UNC); Alfred Hero (Michigan); Nils Hjort (Oslo); Pierre Jacob (Harvard); Keli Liu (Stanford); Regina Liu (Rutgers); Antonietta Mira (USI); Ryan Martin (NC State); Vijay Nair (Michigan); James Robins (Harvard); Daniel Roy (Toronto); Donald B. Rubin (Harvard); Peter XK Song (Michigan); Gunnar Taraldsen (NUST); Tyler VanderWeele (HSPH); Vladimir Vovk (London); Nanny Wermuth (Chalmers/Gutenberg); Min-ge Xie (Rutgers)*

## The ASA Document on P-Values: One Year On

I’m surprised it’s a year already since posting my published comments on the ASA Document on P-Values. Since then, there have been a slew of papers rehearsing the well-worn fallacies of tests (a tad bit more than the usual rate). Doubtless, the P-value Pow Wow raised people’s consciousnesses. I’m interested in hearing reader reactions/experiences in connection with the P-Value project (positive and negative) over the past year. (Use the comments, share links to papers; and/or send me something slightly longer for a possible guest post.)

Some people sent me a diagram from a talk by Stephen Senn (on “P-values and the art of herding cats”). He presents an array of different cat commentators, and for some reason Mayo cat is in the middle but way over on the left side,near the wall. I never got the key to interpretation. My contribution is below:

**“Don’t Throw Out The Error Control Baby With the Bad Statistics Bathwater”**

D. Mayo*[1]

The American Statistical Association is to be credited with opening up a discussion into p-values; now an examination of the foundations of other key statistical concepts is needed. Continue reading

## 3 YEARS AGO (JANUARY 2014): MEMORY LANE

**MONTHLY MEMORY LANE: 3 years ago: January 2014. **I mark in **red** **three** posts from each month that seem most apt for general background on key issues in this blog, excluding those reblogged recently**[1], and in ****green**** up to 3 others I’d recommend[2]**.** **Posts that are part of a “unit” or a group count as one. This month, I’m grouping the 3 posts from my seminar with A. Spanos, counting them as 1.

**January 2014
**

- (1/2) Winner of the December 2013 Palindrome Book Contest (Rejected Post)
- (1/3) Error Statistics Philosophy: 2013
**(1/4) Your 2014 wishing well. …**- (1/7) “Philosophy of Statistical Inference and Modeling” New Course: Spring 2014: Mayo and Spanos: (Virginia Tech)
**(1/11) Two Severities? (PhilSci and PhilStat)****(1/14) Statistical Science meets Philosophy of Science: blog beginnings****(1/16) Objective/subjective, dirty hands and all that: Gelman/Wasserman blogolog**(ii)**(1/18) Sir Harold Jeffreys’ (tail area) one-liner: Sat night comedy**[draft ii]- (1/22) Phil6334: “Philosophy of Statistical Inference and Modeling” New Course: Spring 2014: Mayo and Spanos (Virginia Tech) UPDATE: JAN 21
**(1/24) Phil 6334: Slides from Day #1: Four Waves in Philosophy of Statistics****(1/25) U-Phil (Phil 6334) How should “prior information” enter in statistical inference?**- (1/27) Winner of the January 2014 palindrome contest (rejected post)
- (1/29) BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE: Revisiting the Foundations of Statistics
**(1/31) Phil 6334: Day #2 Slides**

**[1]** Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.

**[2]** New Rule, July 30, 2016-very convenient.