Has Statistics become corrupted? Philip Stark’s questions (and some questions about them) (ii)

.

In this post, I consider the questions posed for my (October 9) Neyman Seminar by Philip Stark, Distinguished Professor of Statistics at UC Berkeley. We didn’t directly deal with them during the panel discussion following my talk, and I find some of them a bit surprising. (Other panelist’s questions are here).

Philip Stark asks:

When and how did Statistics lose its way and become (largely) a mechanical way to bless results rather than a serious attempt to avoid fooling ourselves and others?

  1. To what extent have statisticians been complicit in the corruption of Statistics?
  2. Are there any clear turning points where things got noticeably worse?
  3. Is this a problem of statistics instruction ((a) teaching methodology rather than teaching how to answer scientific questions, (b) deemphasizing assumptions, (c) encouraging mechanical calculations and ignoring the interpretation of those calculations), (d) of disciplinary myopia (to publish in the literature of particular disciplines, you are required to use inappropriate methods), (e) of moral hazard (statisticians are often funded on scientific projects and have a strong incentive to do whatever it takes to bless “discoveries”), or something else?
  4. What can academic statisticians do to help get the train back on the tracks? Can you point to good examples?

These are important and highly provocative questions! To a large extent, Stark and other statisticians would be the ones to address them. As an outsider, and as a philosopher of science, I will merely analyze these questions. and in so doing raise some questions about them. That’s Part I of this post. In Part II, I will list some of Stark’s replies to #5 in his (2018) joint paper with Andrea Saltelli “Cargo-cult statistics and scientific crisis”. (The full paper is relevant for #1-4 as well.) Continue reading

Categories: Neyman Seminar, Stark | 16 Comments

Excursion 1 Tour II (4th stop): The Law of Likelihood and Error Statistics (1.4)

Ship Statinfasst

We are starting on Tour II of Excursion 1 (4th stop).  The 3rd stop is in an earlier blog post. As I promised, this cruise of SIST is leisurely. I have not yet shared new reflections in the comments–but I will! 

Where YOU are in the journey: Continue reading

Categories: Bayesian/frequentist, Likelihood Principle, LSE PH 500 | Leave a comment

Panel Discussion Questions from my Neyman Lecture: “Severity as a basic concept in philosophy of statistics”

 

Giordano, Snow, Yu, Stark, Recht

My Neyman Seminar in the Statistics Department at Berkeley was followed by a lively panel discussion including 4 Berkeley faculty, orchestrated by Ryan Giordano (Dept of Statistics):

  • Xueyin Snow Zhang (Dept. of Philosophy)
  • Bin Yu (Depts. of Statistics, Electrical Engineering and Computer Sciences)
  • Philip Stark (Dept. of Statistics)
  • Ben Recht (Dept. of Electrical Engineering and Computer Sciences)

Continue reading

Categories: Berkeley Neyman Seminar | 4 Comments

Response to Ben Recht’s post (“What is Statistics’ Purpose?”) on my Neyman seminar (ii)

.

There was a very valuable panel discussion after my October 9 Neyman Seminar in the Statistics Department at UC Berkeley.  I want to respond to many of the questions put forward by the participants (Ben Recht, Philip Stark, Bin Yu, Snow Zhang)  that we did not address during that panel. Slides from my presentation, “Severity as a basic concept of philosophy of statistics” are at the end of this post (but with none of the animations). I begin in this post by responding to Ben Recht, a professor of Artificial Intelligence and Computer Science at Berkeley, and his recent blogpost, What is Statistics’ Purpose? On severe testing, regulation, and butter passing, on my talk. I will consider: (1) A complex or leading question; (2) Why I chose to focus about Neyman’s philosophy of statistics and (3) What the “100 years of fighting and browbeating” were/are all about. Continue reading

Categories: affirming the consequent, Ben Recht, Neyman, P-values, Severity, statistical significance tests, statistics wars | 10 Comments

Excursion 1 Tour I (3rd stop): The Current State of Play in Statistical Foundations: A View From a Hot-Air Balloon (1.3)

Third Stop

Readers: With this third stop we’ve covered Tour 1 of Excursion 1.  My slides from the first LSE meeting in 2020 which dealt with elements of Excursion 1 can be found at the end of this post. There’s also a video giving an overall intro to SIST, Excursion 1. It’s noteworthy to consider just how much things seem to have changed in just the past few years. Or have they? What would the view from the hot-air balloon look like now?  I will try to address this in the comments.

 

Continue reading

Categories: 2024 Leisurely Cruise, Statistical Inference as Severe Testing | Leave a comment

Excursion 1 Tour I (2nd Stop): Probabilism, Performance, and Probativeness (1.2)

.

Readers: I gave the Neyman Seminar at Berkeley last Wednesday, October 9, and had been so busy preparing it that I did not update my leisurely cruise for October. This is the second stop. I will shortly post remarks on the the panel discussion that followed my Neyman talk (with panelists, Ben Recht, Philip Stark, Bin Yu, and Snow Zhang), which was quite illuminating. 

“I shall be concerned with the foundations of the subject. But in case it should be thought that this means I am not here strongly concerned with practical applications, let me say right away that confusion about the foundations of the subject is responsible, in my opinion, for much of the misuse of the statistics that one meets in fields of application such as medicine, psychology, sociology, economics, and so forth”. (George Barnard 1985, p. 2)

Continue reading

Categories: Error Statistics | Leave a comment

The leisurely cruise begins: Excerpt from Excursion 1 Tour 1 of Statistical Inference as Severe Testing (SIST)

Ship Statinfasst

Excerpt from excursion 1 Tour I: Beyond Probabilism and Performance: Severity Requirement (1.1)

NOTE: The following is an excerpt from my existing book: Statistical Inference as Severe Testing: How to get beyond the statistics wars (CUP, 2018). For any new reflections or corrections, I will use the comments. The initial announcement is here.

I’m talking about a specific, extra type of integrity that is [beyond] not lying, but bending over backwards to show how you’re maybe wrong, that you ought to have when acting as a scientist. (Feynman 1974/1985, p. 387)

Continue reading

Categories: Error Statistics | Leave a comment

Leisurely cruise through Statistical Inference as Severe Testing: First Announcement

Ship Statinfasst

We’re embarking on a leisurely cruise through the highlights of Statistical Inference as Severe Testing [SIST]: How to Get Beyond the Statistics Wars (CUP 2018) this fall (Oct-Jan), following the 5 seminars I led for a 2020 London School of Economics (LSE) Graduate Research Seminar. It was run entirely online due to Covid (as were the workshops that followed). In this new, relaxed (self-paced) journey, excursions that had been covered in a week, will be spread out over a month [i] and I’ll be posting abbreviated excerpts on this blog a few times a month. Look for the posts marked with the picture of ship StatInfAsSt. [ii] Continue reading

Categories: 2024 Leisurely Cruise, Announcement | Leave a comment

An exchange between A. Gelman and D. Mayo on abandoning statistical significance: 5 years ago

.

Below is an email exchange that Andrew Gelman posted on this day 5 years ago on his blog, Statistical Modeling, Causal Inference, and Social Science.  (You can find the original exchange, with its 130 comments, here.) Note: “Me” refers to Gelman. I will share my current reflections in the comments.

Exchange with Deborah Mayo on abandoning statistical significance

Continue reading

Categories: 5-year memory lane, abandon statistical significance, Gelman blogs an exchange with Mayo | 4 Comments

Georgi Georgiev (Guest Post): “The frequentist vs Bayesian split in online experimentation before and after the ‘abandon statistical significance’ call”

.

Georgi Georgiev

  • Author of Statistical methods in online A/B testing
  • Founder of Analytics-Toolkit.com
  • Statistics instructor at CXL Institute

In online experimentation, a.k.a. online A/B testing, one is primarily interested in estimating if and how different user experiences affect key business metrics such as average revenue per user. A trivial example would be to determine if a given change to the purchase flow of an e-commerce website is positive or negative as measured by average revenue per user, and by how much. An online controlled experiment would be conducted with actual users assigned randomly to either the currently implemented experience or the changed one. Continue reading

Categories: A/B testing, abandon statistical significance, optional stopping | Tags: | 25 Comments

Don’t divorce statistical inference from “statistical thinking”: some exchanges

 

.

A topic that came up in some comments recently reflects a recent tendency to divorce statistical inference (bad) from statistical thinking (good), and it deserves the spotlight of a post. I always alert authors of papers that come up on this blog, inviting them to comment, and one from Christopher Tong (reacting to a comment on Ron Kenett) concerns this dichotomy.

Response by Christopher Tong to D. Mayo’s July 14 comment

TONG: In responding to Prof. Kenett, Prof. Mayo states: “we should reject the supposed dichotomy between ‘statistical method and statistical thinking’ which unfortunately gives rise to such titles as ‘Statistical inference enables bad science, statistical thinking enables good science,’ in the special TAS 2019 issue. This is nonsense.” [Mayo July 14 comment here.] Continue reading

Categories: statistical inference vs statistical thinking, statistical significance tests, Wasserstein et al 2019 | 11 Comments

Andrew Gelman (Guest post): (Trying to) clear up a misunderstanding about decision analysis and significance testing

.

Professor Andrew Gelman
Higgins Professor of Statistics
Professor of Political Science
Director of the Applied Statistics Center
Columbia University

 

(Trying to) clear up a misunderstanding about decision analysis and significance testing

Background

In our 2019 article, Abandon Statistical Significance, Blake McShane, David Gal, Christian Robert, Jennifer Tackett, and I talk about three scenarios: summarizing research, scientific publication, and decision making.

In making our recommendations, we’re not saying it will be easy; we’re just saying that screening based on statistical significance has lots of problems. P-values and related measures are not useless—there can be value in saying that an estimate is only 1 standard error away from 0 and so it is consistent with the null hypothesis, or that an estimate is 10 standard errors from zero and so the null can be rejected, or than an estimate is 2 standard errors from zero, which is something that we would not usually see if the null hypothesis were true. Comparison to a null model can be a useful statistical tool, in its place. The problem we see with “statistical significance” is when this tool is used as a dominant or default or master paradigm: Continue reading

Categories: abandon statistical significance, gelman, statistical significance tests, Wasserstein et al 2019 | 29 Comments

Aris Spanos Guest Post: “On Frequentist Testing: revisiting widely held confusions and misinterpretations”

.

Aris Spanos
Wilson Schmidt Professor of Economics
Department of Economics
Virginia Tech

The following guest post (link to PDF of this post) was written as a comment to Mayo’s recent post: “Abandon Statistical Significance and Bayesian Epistemology: some troubles in philosophy v3“.

On Frequentist Testing: revisiting widely held confusions and misinterpretations

After reading chapter 13.2 of the 2022 book Fundamentals of Bayesian Epistemology 2: Arguments, Challenges, Alternatives, by Michael G. Titelbaum, I decided to write a few comments relating to his discussion in an attempt to delineate certain key concepts in frequentist testing with a view to shed light on several long-standing confusions and misinterpretations of these testing procedures. The key concepts include ‘what is a frequentist test’, ‘what is a test statistic and how it is chosen’, and ‘how the hypotheses of interest are framed’. Continue reading

Categories: abandon statistical significance, Spanos | 13 Comments

Guest Post: Yudi Pawitan: “Update on Behavioral aspects in the statistical significance war-game (‘abandon statistical significance 5 years on’)

.

Professor Yudi Pawitan
Department of Medical Epidemiology and Biostatistics
Karolinska Institutet, Stockholm, Sweden

[An earlier guest post on this topic by Y. Pawitan is Jan 10, 2022: Yudi Pawitan: Behavioral aspects in the statistical significance war-game]

Behavioral aspects in the statistical significance war-game

I remember with fondness the good old days when the only ‘statistical war’-game was fought between the Bayesian and the frequentist. It was simpler and the participants were for the most part collegial. Moreover, there was a feeling that it was a philosophical debate. Even though the Bayesian-frequentist war is not fully settled, we can see areas of consensus, for example in objective Bayesianism or in conditional inference. However, on the P-value and statistical significance front, the war looks less simple since it is about statistical praxis; it is no longer Bayesian vs frequentist, with no consensus in sight and with wide implications affecting the day-to-day use of statistics. Continue reading

Categories: abandon statistical significance, game-theoretic analyses, Wasserstein et al. (2019) | 12 Comments

Abandon Statistical Significance and Bayesian Epistemology: some troubles in philosophy v3

.

Has the “abandon significance” movement in statistics trickled down into philosophy of science? A little bit. Nowadays (since the late 1990’s [i]), probabilistic inference and confirmation enter in philosophy by way of fields dubbed formal epistemology and Bayesian epistemology. These fields, as I see them, are essentially ways to do analytic epistemology using probability. Given its goals, I do not criticize the best known current text in Bayesian Epistemology with that title, Titelbaum 2022, for not engaging in foundational problems of Bayesian practice, be it subjective, non-subjective (conventional), empirical or what some call “pragmatic” Bayesianism. The text focuses on probability as subjective degree of belief. I have employed chapters from it in my own seminars in spring 2023 to explain some Bayesian puzzles such as the tacking paradox. But I am troubled with some of the examples Titelbaum uses in criticizing statistical significance tests. I only came across them while flipping through some later chapters of the text while observing a session of my colleague Rohan Sud’s course on Bayesian Epistemology this spring. It was not a topic of his seminar. Continue reading

Categories: Bayesian epistemology, Bayesian priors, Bayesian/frequentist, Diagnostic Screening | 15 Comments

Guest Post: John Park: Abandoning P-values and Embracing Artificial Intelligence in Medicine (thoughts on “abandon statistical significance 5 years on”)

.

John Park, MD
Medical Director of Radiation Oncology
North Kansas City Hospital
Clinical Assistant Professor
Univ. Of Missouri-Kansas City

[An earlier post  by J. Park on this topic: Jan 17, 2022: John Park: Poisoned Priors: Will You Drink from This Well? (Guest Post)]

Abandoning P-values and Embracing Artificial Intelligence in Medicine

The move to abandon P-values that started 5 years ago was, as we say in medicine, merely a symptom of a deeper more sinister diagnosis. Within medicine, the diagnosis was a lack of statistical and philosophical knowledge. Specifically, this presented as an uncritical move towards Bayesianism away from frequentist methods, that went essentially unchallenged. The debate between frequentists and Bayesians, though longstanding, was little known inside oncology. Out of concern, I sought a collaboration with Prof. Mayo, which culminated into a lecture given at the 2021 American Society of Radiation Oncology meeting. The lecture included not only representatives from frequentist and Bayesian statistics, but another interesting guest that was flying under the radar in my field at that time… artificial intelligence (AI). Continue reading

Categories: abandon statistical significance, Artificial Intelligence/Machine Learning, oncology | 21 Comments

Guest Post: Ron Kenett: What’s happening in statistical practice since the “abandon statistical significance” call

.

Ron S. Kenett
Chairman of the KPA Group;
Senior Research Fellow, the Samuel Neaman Institute, Technion, Haifa;
Chairman, Data Science Society, Israel

 

What’s happening in statistical practice since the “abandon statistical significance” call

This is a retrospective view from experience gained by applying statistics to a wide range of problems, with an emphasis on the past few years. The post is kept at a general level in order to provide a bird’s eye view of the points being made. Continue reading

Categories: abandon statistical significance, Wasserstein et al 2019 | 26 Comments

Guest Post (part 2 of 2): Daniël Lakens: “How were we supposed to move beyond  p < .05, and why didn’t we?”

.

Professor Daniël Lakens
Human Technology Interaction
Eindhoven University of Technology

[Some earlier posts by D. Lakens on this topic are at the end of this post]*

This continues Part 1:

4: Most do not offer any alternative at all

At this point, it might be worthwhile to point out that most of the contributions to the special issue do not discuss alternative approaches to p < .05 at all. They discuss general problems with low quality research (Kmetz, 2019), the importance of improving quality control (D. W. Hubbard & Carriquiry, 2019), results blind reviewing (Locascio, 2019), or the role of subjective judgment (Brownstein et al., 2019). There are historical perspectives on how we got to this point (Kennedy-Shaffer, 2019), ideas about how science should work instead, many stressing the importance of replication studies (R. Hubbard et al., 2019; Tong, 2019). Note that Trafimow both recommends replication as an alternative (Trafimow, 2019), but also co-authors a paper stating we should not expect findings to replicate (Amrhein et al., 2019), thereby directly contradicting himself within the same special issue. Others propose not simply giving up on p-values, but on generalizable knowledge (Amrhein et al., 2019). The suggestion is to only report descriptive statistics. Continue reading

Categories: abandon statistical significance, D. Lakens, Wasserstein et al 2019 | 13 Comments

Guest Post: “Daniël Lakens: How were we supposed to move beyond  p < .05, and why didn’t we? “(part 1 of 2):

.

Professor Daniël Lakens
Human Technology Interaction
Eindhoven University of Technology

*[Some earlier posts by D. Lakens on this topic are listed at the end of part 2, forthcoming this week]

How were we supposed to move beyond  p < .05, and why didn’t we?

It has been 5 years since the special issue “Moving to a world beyond p < .05” came out (Wasserstein et al., 2019). I might be the only person in the world who has read all 43 contributions to this special issue. [In part 1] I will provide a summary of what the articles proposed we should do instead of p < .05, and [in part 2] offer some reflections on why they did not lead to any noticeable change. Continue reading

Categories: abandon statistical significance, D. Lakens, Wasserstein et al. (2019) | 23 Comments

Guest Post: Christian Hennig: “Statistical tests in five random research papers of 2024, and related thoughts on the ‘don’t say significant’ initiative”

.

Professor Christian Hennig
Department of Statistical Sciences “Paolo Fortunati”
University of Bologna

[An earlier post by C. Hennig on this topic:  Jan 9, 2022: The ASA controversy on P-values as an illustration of the difficulty of statistics]

Statistical tests in five random research papers of 2024, and related thoughts on the “don’t say significant” initiative

This text follows an invitation to write on “abandon statistical significance 5 years on”, so I decided to do a tiny bit of empirical research. I had a look at five new papers listed on May 17 on the “Research Articles” site of Scientific Reports. I chose the most recent five papers when I looked without being selective. As I “sampled” papers for a general impression, I don’t want this to be a criticism of particular papers or authors, however in the interest of transparency, the doi addresses of the papers are: Continue reading

Categories: 5-year memory lane, abandon statistical significance, Christian Hennig | 7 Comments

Blog at WordPress.com.