Replying to a review of Statistical Inference as Severe Testing by P. Bandyopadhyay

.

Notre Dame Philosophical Reviews is a leading forum for publishing reviews of books in philosophy. The philosopher of statistics, Prasanta Bandyopadhyay, published a review of my book Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP)(SIST) in this journal, and I very much appreciate his doing so. Here I excerpt from his review, and respond to a cluster of related criticisms in order to avoid some fundamental misunderstandings of my project. Here’s how he begins:

In this book, Deborah G. Mayo (who has the rare distinction of making an impact on some of the most influential statisticians of our time) delves into issues in philosophy of statistics, philosophy of science, and scientific methodology more thoroughly than in her previous writings. Her reconstruction of the history of statistics, seamless weaving of the issues in the foundations of statistics with the development of twentieth-century philosophy of science, and clear presentation that makes the content accessible to a non-specialist audience constitute a remarkable achievement. Mayo has a unique philosophical perspective which she uses in her study of philosophy of science and current statistical practice.[1]

Bandyopadhyay

I regard this as one of the most important philosophy of science books written in the last 25 years. However, as Mayo herself says, nobody should be immune to critical assessment. This review is written in that spirit; in it I will analyze some of the shortcomings of the book.
Continue reading

Categories: Statistical Inference as Severe Testing–Review | Tags: | 24 Comments

R. A. Fisher: How an Outsider Revolutionized Statistics (Aris Spanos)

A SPANOS

.

This is a belated birthday post for R.A. Fisher (17 February, 1890-29 July, 1962)–it’s a guest post from earlier on this blog by Aris Spanos. 

Happy belated birthday to R.A. Fisher!

‘R. A. Fisher: How an Outsider Revolutionized Statistics’

by Aris Spanos

Few statisticians will dispute that R. A. Fisher (February 17, 1890 – July 29, 1962) is the father of modern statistics; see Savage (1976), Rao (1992). Inspired by William Gosset’s (1908) paper on the Student’s t finite sampling distribution, he recast statistics into the modern model-based induction in a series of papers in the early 1920s. He put forward a theory of optimal estimation based on the method of maximum likelihood that has changed only marginally over the last century. His significance testing, spearheaded by the p-value, provided the basis for the Neyman-Pearson theory of optimal testing in the early 1930s. According to Hald (1998) Continue reading

Categories: Fisher, phil/history of stat, Spanos | 2 Comments

Bad Statistics is Their Product: Fighting Fire With Fire (ii)

Mayo fights fire w/ fire

I. Doubt is Their Product is the title of a (2008) book by David Michaels, Assistant Secretary for OSHA from 2009-2017. I first mentioned it on this blog back in 2011 (“Will the Real Junk Science Please Stand Up?) The expression is from a statement by a cigarette executive (“doubt is our product”), and the book’s thesis is explained in its subtitle: How Industry’s Assault on Science Threatens Your Health. Imagine you have just picked up a book, published in 2020: Bad Statistics is Their Product. Is the author writing about how exaggerating bad statistics may serve in the interest of denying well-established risks? [Interpretation A]. Or perhaps she’s writing on how exaggerating bad statistics serves the interest of denying well-established statistical methods? [Interpretation B]. Both may result in distorting science and even in dismantling public health safeguards–especially if made the basis of evidence policies in agencies. A responsible philosopher of statistics should care. Continue reading

Categories: ASA Guide to P-values, Error Statistics, P-values, replication research, slides | 33 Comments

My paper, “P values on Trial” is out in Harvard Data Science Review

.

My new paper, “P Values on Trial: Selective Reporting of (Best Practice Guides Against) Selective Reporting” is out in Harvard Data Science Review (HDSR). HDSR describes itself as a A Microscopic, Telescopic, and Kaleidoscopic View of Data Science. The editor-in-chief is Xiao-li Meng, a statistician at Harvard. He writes a short blurb on each article in his opening editorial of the issue. Continue reading

Categories: multiple testing, P-values, significance tests, Statistics | 29 Comments

S. Senn: “Error point: The importance of knowing how much you don’t know” (guest post)

.

Stephen Senn
Consultant Statistician
Edinburgh

‘The term “point estimation” made Fisher nervous, because he associated it with estimation without regard to accuracy, which he regarded as ridiculous.’ Jimmy Savage [1, p. 453] 

First things second

The classic text by David Cox and David Hinkley, Theoretical Statistics (1974), has two extremely interesting features as regards estimation. The first is in the form of an indirect, implicit, message and the second explicit and both teach that point estimation is far from being an obvious goal of statistical inference. The indirect message is that the chapter on point estimation (chapter 8) comes after that on interval estimation (chapter 7). This may puzzle the reader, who may anticipate that the complications of interval estimation would be handled after the apparently simpler point estimation rather than before. However, with the start of chapter 8, the reasoning is made clear. Cox and Hinkley state: Continue reading

Categories: Fisher, randomization, Stephen Senn | Tags: | 8 Comments

Aris Spanos Reviews Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars

A. Spanos

Aris Spanos was asked to review my Statistical Inference as Severe Testing: how to Get Beyond the Statistics Wars (CUP, 2018), but he was to combine it with a review of the re-issue of Ian Hacking’s classic  Logic of Statistical Inference. The journal is OEconomia: History, Methodology, Philosophy. Below are excerpts from his discussion of my book (pp. 843-860). I will jump past the Hacking review, and occasionally excerpt for length.To read his full article go to external journal pdf or stable internal blog pdf. Continue reading

Categories: Spanos, Statistical Inference as Severe Testing | Leave a comment

The NAS fixes its (main) mistake in defining P-values!

Mayo new elbow

(reasonably) satisfied

Remember when I wrote to the National Academy of Science (NAS) in September pointing out mistaken definitions of P-values in their document on Reproducibility and Replicability in Science? (see my 9/30/19 post). I’d given up on their taking any action, but yesterday I received a letter from the NAS Senior Program officer:

Dear Dr. Mayo,

I am writing to let you know that the Reproducibility and Replicability in Science report has been updated in response to the issues that you have raised.
Two footnotes, on pages 31 35 and 221, highlight the changes. The updated report is available from the following link: NEW 2020 NAS DOC

Thank you for taking the time to reach out to me and to Dr. Fineberg and letting us know about your concerns.
With kind regards and wishes of a happy 2020,
Jenny Heimberg
Jennifer Heimberg, Ph.D.
Senior Program Officer

The National Academies of Sciences, Engineering, and Medicine

Continue reading

Categories: NAS, P-values | 2 Comments

Midnight With Birnbaum (Happy New Year 2019)!

 Just as in the past 8 years since I’ve been blogging, I revisit that spot in the road at 9p.m., just outside the Elbar Room, look to get into a strange-looking taxi, to head to “Midnight With Birnbaum”. (The pic on the left is the only blurry image I have of the club I’m taken to.) I wonder if the car will come for me this year, as I wait out in the cold, now that Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (STINT 2018) has been out over a year. STINT doesn’t rehearse the argument from my Birnbaum article, but there’s much in it that I’d like to discuss with him. The (Strong) Likelihood Principle–whether or not it is named–remains at the heart of many of the criticisms of Neyman-Pearson (N-P) statistics (and cognate methods). 2019 was the 61th birthday of Cox’s “weighing machine” example, which was the basis of Birnbaum’s attempted proof. Yet as Birnbaum insisted, the “confidence concept” is the “one rock in a shifting scene” of statistical foundations, insofar as there’s interest in controlling the frequency of erroneous interpretations of data. (See my rejoinder.) Birnbaum bemoaned the lack of an explicit evidential interpretation of N-P methods. Maybe in 2020? Anyway, the cab is finally here…the rest is live. Happy New Year! Continue reading

Categories: Birnbaum Brakes, strong likelihood principle | Tags: , , , | Leave a comment

A Perfect Time to Binge Read the (Strong) Likelihood Principle

An essential component of inference based on familiar frequentist notions: p-values, significance and confidence levels, is the relevant sampling distribution (hence the term sampling theory, or my preferred error statistics, as we get error probabilities from the sampling distribution). This feature results in violations of a principle known as the strong likelihood principle (SLP). To state the SLP roughly, it asserts that all the evidential import in the data (for parametric inference within a model) resides in the likelihoods. If accepted, it would render error probabilities irrelevant post data. Continue reading

Categories: Birnbaum, Birnbaum Brakes, law of likelihood | 7 Comments

61 Years of Cox’s (1958) Chestnut: Excerpt from Excursion 3 Tour II (Mayo 2018, CUP)

.

2018 marked 60 years since the famous weighing machine example from Sir David Cox (1958)[1]. it is now 61. It’s one of the “chestnuts” in the exhibits of “chestnuts and howlers” in Excursion 3 (Tour II) of my (still) new book Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (SIST, 2018). It’s especially relevant to take this up now, just before we leave 2019, for reasons that will be revealed over the next day or two. For a sneak preview of those reasons, see the “note to the reader” at the end of this post. So, let’s go back to it, with an excerpt from SIST (pp. 170-173). Continue reading

Categories: Birnbaum, Statistical Inference as Severe Testing, strong likelihood principle | Leave a comment

Posts of Christmas Past (1): 13 howlers of significance tests (and how to avoid them)

.

I’m reblogging a post from Christmas past–exactly 7 years ago. Guess what I gave as the number 1 (of 13) howler well-worn criticism of statistical significance tests, haunting us back in 2012–all of which are put to rest in Mayo and Spanos 2011? Yes, it’s the frightening allegation that statistical significance tests forbid using any background knowledge! The researcher is imagined to start with a “blank slate” in each inquiry (no memories of fallacies past), and then unthinkingly apply a purely formal, automatic, accept-reject machine. What’s newly frightening (in 2019) is the credulity with which this apparition is now being met (by some). I make some new remarks below the post from Christmas past: Continue reading

Categories: memory lane, significance tests, Statistics | Tags: | Leave a comment

“Les stats, c’est moi”: We take that step here! (Adopt our fav word or phil stat!)(iii)

les stats, c’est moi

When it comes to the statistics wars, leaders of rival tribes sometimes sound as if they believed “les stats, c’est moi”.  [1]. So, rather than say they would like to supplement some well-known tenets (e.g., “a statistically significant effect may not be substantively important”) with a new rule that advances their particular preferred language or statistical philosophy, they may simply blurt out: “we take that step here!” followed by whatever rule of language or statistical philosophy they happen to prefer (as if they have just added the new rule to the existing, uncontested tenets). Karan Kefadar, in her last official (December) report as President of the American Statistical Association (ASA), expresses her determination to call out this problem at the ASA itself. (She raised it first in her June article, discussed in my last post.) Continue reading

Categories: ASA Guide to P-values | 84 Comments

P-Value Statements and Their Unintended(?) Consequences: The June 2019 ASA President’s Corner (b)

2208388671_0d8bc38714

Mayo writing to Kafadar

I never met Karen Kafadar, the 2019 President of the American Statistical Association (ASA), but the other day I wrote to her in response to a call in her extremely interesting June 2019 President’s Corner: “Statistics and Unintended Consequences“:

  • “I welcome your suggestions for how we can communicate the importance of statistical inference and the proper interpretation of p-values to our scientific partners and science journal editors in a way they will understand and appreciate and can use with confidence and comfort—before they change their policies and abandon statistics altogether.”

I only recently came across her call, and I will share my letter below. First, here are some excerpts from her June President’s Corner (her December report is due any day). Continue reading

Categories: ASA Guide to P-values, Bayesian/frequentist, P-values | 3 Comments

A. Saltelli (Guest post): What can we learn from the debate on statistical significance?

Professor Andrea Saltelli
Centre for the Study of the Sciences and the Humanities (SVT), University of Bergen (UIB, Norway),
&
Open Evidence Research, Universitat Oberta de Catalunya (UOC), Barcelona

What can we learn from the debate on statistical significance?

The statistical community is in the midst of crisis whose latest convulsion is a petition to abolish the concept of significance. The problem is perhaps neither with significance, nor with statistics, but with the inconsiderate way we use numbers, and with our present approach to quantification.  Unless the crisis is resolved, there will be a loss of consensus in scientific arguments, with a corresponding decline of public trust in the findings of science. Continue reading

Categories: Error Statistics | 11 Comments

The ASA’s P-value Project: Why it’s Doing More Harm than Good (cont from 11/4/19)

 

cure by committee

Everything is impeach and remove these days! Should that hold also for the concept of statistical significance and P-value thresholds? There’s an active campaign that says yes, but I aver it is doing more harm than good. In my last post, I said I would count the ways it is detrimental until I became “too disconsolate to continue”. There I showed why the new movement, launched by Executive Director of the ASA (American Statistical Association), Ronald Wasserstein (in what I dub ASA II(note)), is self-defeating: it instantiates and encourages the human-all-too-human tendency to exploit researcher flexibility, rewards, and openings for bias in research (F, R & B Hypothesis). That was reason #1. Just reviewing it already fills me with such dismay, that I fear I will become too disconsolate to continue before even getting to reason #2. So let me just quickly jot down reasons #2, 3, 4, and 5 (without full arguments) before I expire. Continue reading

Categories: ASA Guide to P-values | 7 Comments

On Some Self-Defeating Aspects of the ASA’s (2019) Recommendations on Statistical Significance Tests (ii)

.

“Before we stood on the edge of the precipice, now we have taken a great step forward”

 

What’s self-defeating about pursuing statistical reforms in the manner taken by the American Statistical Association (ASA) in 2019? In case you’re not up on the latest in significance testing wars, the 2016 ASA Statement on P-Values and Statistical Significance, ASA I, arguably, was a reasonably consensual statement on the need to avoid some well-known abuses of P-values–notably if you compute P-values, ignoring selective reporting, multiple testing, or stopping when the data look good, the computed P-value will be invalid. (Principle 4, ASA I) But then Ron Wasserstein, executive director of the ASA, and co-editors, decided they weren’t happy with their own 2016 statement because it “stopped just short of recommending that declarations of ‘statistical significance’ be abandoned” altogether. In their new statement–ASA II(note)–they announced: “We take that step here….Statistically significant –don’t say it and don’t use it”.

Why do I say it is a mis-take to have taken the supposed next “great step forward”? Why do I count it as unsuccessful as a piece of statistical science policy? In what ways does it make the situation worse? Let me count the ways. The first is in this post. Others will come in following posts, until I become too disconsolate to continue.[i] Continue reading

Categories: P-values, stat wars and their casualties, statistical significance tests | 14 Comments

Exploring a new philosophy of statistics field

This article came out on Monday on our Summer Seminar in Philosophy of Statistics in Virginia Tech News Daily magazine.

October 28, 2019

.

From universities around the world, participants in a summer session gathered to discuss the merits of the philosophy of statistics. Co-director Deborah Mayo, left, hosted an evening for them at her home.

Continue reading

Categories: Philosophy of Statistics, Summer Seminar in PhilStat | 2 Comments

The First Eye-Opener: Error Probing Tools vs Logics of Evidence (Excursion 1 Tour II)

1.4, 1.5

In Tour II of this first Excursion of Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (SIST, 2018, CUP),  I pull back the cover on disagreements between experts charged with restoring integrity to today’s statistical practice. Some advised me to wait until later (in the book) to get to this eye-opener. Granted, the full story involves some technical issues, but after many months, I think I arrived at a way to get to the heart of things informally (with a promise of more detailed retracing of steps later on). It was too important not to reveal right away that some of the most popular “reforms” fall down on the job even with respect to our most minimal principle of evidence (you don’t have evidence for a claim if little if anything has been done to probe the ways it can be flawed).  Continue reading

Categories: Error Statistics, law of likelihood, SIST | 14 Comments

The Current State of Play in Statistical Foundations: A View From a Hot-Air Balloon

1.3

Continue to the third, and last stop of Excursion 1 Tour I of Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP)–Section 1.3. It would be of interest to ponder if (and how) the current state of play in the stat wars has shifted in just one year. I’ll do so in the comments. Use that space to ask me any questions.

How can a discipline, central to science and to critical thinking, have two methodologies, two logics, two approaches that frequently give substantively different answers to the same problems? … Is complacency in the face of contradiction acceptable for a central discipline of science? (Donald Fraser 2011, p. 329)

We [statisticians] are not blameless … we have not made a concerted professional effort to provide the scientific world with a unified testing methodology. (J. Berger 2003, p. 4)

Continue reading

Categories: Statistical Inference as Severe Testing | 3 Comments

Severity: Strong vs Weak (Excursion 1 continues)

1.2

Marking one year since the appearance of my book: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP), let’s continue to the second stop (1.2) of Excursion 1 Tour 1. It begins on p. 13 with a quote from statistician George Barnard. Assorted reflections will be given in the comments. Ask me any questions pertaining to the Tour.

 

  • I shall be concerned with the foundations of the subject. But in case it should be thought that this means I am not here strongly concerned with practical applications, let me say right away that confusion about the foundations of the subject is responsible, in my opinion, for much of the misuse of the statistics that one meets in fields of application such as medicine, psychology, sociology, economics, and so forth. (George Barnard 1985, p. 2)

Continue reading

Categories: Statistical Inference as Severe Testing | 5 Comments

Blog at WordPress.com.