A Perfect Time to Binge Read the (Strong) Likelihood Principle

An essential component of inference based on familiar frequentist notions: p-values, significance and confidence levels, is the relevant sampling distribution (hence the term sampling theory, or my preferred error statistics, as we get error probabilities from the sampling distribution). This feature results in violations of a principle known as the strong likelihood principle (SLP). To state the SLP roughly, it asserts that all the evidential import in the data (for parametric inference within a model) resides in the likelihoods. If accepted, it would render error probabilities irrelevant post data. Continue reading

Categories: Birnbaum, Birnbaum Brakes, law of likelihood | 7 Comments

61 Years of Cox’s (1958) Chestnut: Excerpt from Excursion 3 Tour II (Mayo 2018, CUP)

.

2018 marked 60 years since the famous weighing machine example from Sir David Cox (1958)[1]. it is now 61. It’s one of the “chestnuts” in the exhibits of “chestnuts and howlers” in Excursion 3 (Tour II) of my (still) new book Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (SIST, 2018). It’s especially relevant to take this up now, just before we leave 2019, for reasons that will be revealed over the next day or two. For a sneak preview of those reasons, see the “note to the reader” at the end of this post. So, let’s go back to it, with an excerpt from SIST (pp. 170-173). Continue reading

Categories: Birnbaum, Statistical Inference as Severe Testing, strong likelihood principle | Leave a comment

Posts of Christmas Past (1): 13 howlers of significance tests (and how to avoid them)

.

I’m reblogging a post from Christmas past–exactly 7 years ago. Guess what I gave as the number 1 (of 13) howler well-worn criticism of statistical significance tests, haunting us back in 2012–all of which are put to rest in Mayo and Spanos 2011? Yes, it’s the frightening allegation that statistical significance tests forbid using any background knowledge! The researcher is imagined to start with a “blank slate” in each inquiry (no memories of fallacies past), and then unthinkingly apply a purely formal, automatic, accept-reject machine. What’s newly frightening (in 2019) is the credulity with which this apparition is now being met (by some). I make some new remarks below the post from Christmas past: Continue reading

Categories: memory lane, significance tests, Statistics | Tags: | Leave a comment

“Les stats, c’est moi”: We take that step here! (Adopt our fav word or phil stat!)(iii)

les stats, c’est moi

When it comes to the statistics wars, leaders of rival tribes sometimes sound as if they believed “les stats, c’est moi”.  [1]. So, rather than say they would like to supplement some well-known tenets (e.g., “a statistically significant effect may not be substantively important”) with a new rule that advances their particular preferred language or statistical philosophy, they may simply blurt out: “we take that step here!” followed by whatever rule of language or statistical philosophy they happen to prefer (as if they have just added the new rule to the existing, uncontested tenets). Karan Kefadar, in her last official (December) report as President of the American Statistical Association (ASA), expresses her determination to call out this problem at the ASA itself. (She raised it first in her June article, discussed in my last post.) Continue reading

Categories: ASA Guide to P-values | 84 Comments

P-Value Statements and Their Unintended(?) Consequences: The June 2019 ASA President’s Corner (b)

2208388671_0d8bc38714

Mayo writing to Kafadar

I never met Karen Kafadar, the 2019 President of the American Statistical Association (ASA), but the other day I wrote to her in response to a call in her extremely interesting June 2019 President’s Corner: “Statistics and Unintended Consequences“:

  • “I welcome your suggestions for how we can communicate the importance of statistical inference and the proper interpretation of p-values to our scientific partners and science journal editors in a way they will understand and appreciate and can use with confidence and comfort—before they change their policies and abandon statistics altogether.”

I only recently came across her call, and I will share my letter below. First, here are some excerpts from her June President’s Corner (her December report is due any day). Continue reading

Categories: ASA Guide to P-values, Bayesian/frequentist, P-values | 3 Comments

A. Saltelli (Guest post): What can we learn from the debate on statistical significance?

Professor Andrea Saltelli
Centre for the Study of the Sciences and the Humanities (SVT), University of Bergen (UIB, Norway),
&
Open Evidence Research, Universitat Oberta de Catalunya (UOC), Barcelona

What can we learn from the debate on statistical significance?

The statistical community is in the midst of crisis whose latest convulsion is a petition to abolish the concept of significance. The problem is perhaps neither with significance, nor with statistics, but with the inconsiderate way we use numbers, and with our present approach to quantification.  Unless the crisis is resolved, there will be a loss of consensus in scientific arguments, with a corresponding decline of public trust in the findings of science. Continue reading

Categories: Error Statistics | 11 Comments

The ASA’s P-value Project: Why it’s Doing More Harm than Good (cont from 11/4/19)

 

cure by committee

Everything is impeach and remove these days! Should that hold also for the concept of statistical significance and P-value thresholds? There’s an active campaign that says yes, but I aver it is doing more harm than good. In my last post, I said I would count the ways it is detrimental until I became “too disconsolate to continue”. There I showed why the new movement, launched by Executive Director of the ASA (American Statistical Association), Ronald Wasserstein (in what I dub ASA II(note)), is self-defeating: it instantiates and encourages the human-all-too-human tendency to exploit researcher flexibility, rewards, and openings for bias in research (F, R & B Hypothesis). That was reason #1. Just reviewing it already fills me with such dismay, that I fear I will become too disconsolate to continue before even getting to reason #2. So let me just quickly jot down reasons #2, 3, 4, and 5 (without full arguments) before I expire. Continue reading

Categories: ASA Guide to P-values | 7 Comments

On Some Self-Defeating Aspects of the ASA’s (2019) Recommendations on Statistical Significance Tests (ii)

.

“Before we stood on the edge of the precipice, now we have taken a great step forward”

 

What’s self-defeating about pursuing statistical reforms in the manner taken by the American Statistical Association (ASA) in 2019? In case you’re not up on the latest in significance testing wars, the 2016 ASA Statement on P-Values and Statistical Significance, ASA I, arguably, was a reasonably consensual statement on the need to avoid some well-known abuses of P-values–notably if you compute P-values, ignoring selective reporting, multiple testing, or stopping when the data look good, the computed P-value will be invalid. (Principle 4, ASA I) But then Ron Wasserstein, executive director of the ASA, and co-editors, decided they weren’t happy with their own 2016 statement because it “stopped just short of recommending that declarations of ‘statistical significance’ be abandoned” altogether. In their new statement–ASA II(note)–they announced: “We take that step here….Statistically significant –don’t say it and don’t use it”.

Why do I say it is a mis-take to have taken the supposed next “great step forward”? Why do I count it as unsuccessful as a piece of statistical science policy? In what ways does it make the situation worse? Let me count the ways. The first is in this post. Others will come in following posts, until I become too disconsolate to continue.[i] Continue reading

Categories: P-values, stat wars and their casualties, statistical significance tests | 14 Comments

Exploring a new philosophy of statistics field

This article came out on Monday on our Summer Seminar in Philosophy of Statistics in Virginia Tech News Daily magazine.

October 28, 2019

.

From universities around the world, participants in a summer session gathered to discuss the merits of the philosophy of statistics. Co-director Deborah Mayo, left, hosted an evening for them at her home.

Continue reading

Categories: Philosophy of Statistics, Summer Seminar in PhilStat | 2 Comments

The First Eye-Opener: Error Probing Tools vs Logics of Evidence (Excursion 1 Tour II)

1.4, 1.5

In Tour II of this first Excursion of Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (SIST, 2018, CUP),  I pull back the cover on disagreements between experts charged with restoring integrity to today’s statistical practice. Some advised me to wait until later (in the book) to get to this eye-opener. Granted, the full story involves some technical issues, but after many months, I think I arrived at a way to get to the heart of things informally (with a promise of more detailed retracing of steps later on). It was too important not to reveal right away that some of the most popular “reforms” fall down on the job even with respect to our most minimal principle of evidence (you don’t have evidence for a claim if little if anything has been done to probe the ways it can be flawed).  Continue reading

Categories: Error Statistics, law of likelihood, SIST | 14 Comments

The Current State of Play in Statistical Foundations: A View From a Hot-Air Balloon

1.3

Continue to the third, and last stop of Excursion 1 Tour I of Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP)–Section 1.3. It would be of interest to ponder if (and how) the current state of play in the stat wars has shifted in just one year. I’ll do so in the comments. Use that space to ask me any questions.

How can a discipline, central to science and to critical thinking, have two methodologies, two logics, two approaches that frequently give substantively different answers to the same problems? … Is complacency in the face of contradiction acceptable for a central discipline of science? (Donald Fraser 2011, p. 329)

We [statisticians] are not blameless … we have not made a concerted professional effort to provide the scientific world with a unified testing methodology. (J. Berger 2003, p. 4)

Continue reading

Categories: Statistical Inference as Severe Testing | 3 Comments

Severity: Strong vs Weak (Excursion 1 continues)

1.2

Marking one year since the appearance of my book: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP), let’s continue to the second stop (1.2) of Excursion 1 Tour 1. It begins on p. 13 with a quote from statistician George Barnard. Assorted reflections will be given in the comments. Ask me any questions pertaining to the Tour.

 

  • I shall be concerned with the foundations of the subject. But in case it should be thought that this means I am not here strongly concerned with practical applications, let me say right away that confusion about the foundations of the subject is responsible, in my opinion, for much of the misuse of the statistics that one meets in fields of application such as medicine, psychology, sociology, economics, and so forth. (George Barnard 1985, p. 2)

Continue reading

Categories: Statistical Inference as Severe Testing | 5 Comments

How My Book Begins: Beyond Probabilism and Performance: Severity Requirement

This week marks one year since the general availability of my book: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP). Here’s how it begins (Excursion 1 Tour 1 (1.1)). Material from the preface is here. I will sporadically give some “one year later” reflections in the comments. I invite readers to ask me any questions pertaining to the Tour.

The journey begins..(1.1)

I’m talking about a specific, extra type of integrity that is [beyond] not lying, but bending over backwards to show how you’re maybe wrong, that you ought to have when acting as a scientist. (Feynman 1974/1985, p. 387)

It is easy to lie with statistics. Or so the cliché goes. It is also very difficult to uncover these lies without statistical methods – at least of the right kind. Self- correcting statistical methods are needed, and, with minimal technical fanfare, that’s what I aim to illuminate. Since Darrell Huff wrote How to Lie with Statistics in 1954, ways of lying with statistics are so well worn as to have emerged in reverberating slogans:

  • Association is not causation.
  • Statistical significance is not substantive significamce
  • No evidence of risk is not evidence of no risk.
  • If you torture the data enough, they will confess.

Continue reading

Categories: Statistical Inference as Severe Testing, Statistics | 4 Comments

National Academies of Science: Please Correct Your Definitions of P-values

Mayo banging head

If you were on a committee to highlight issues surrounding P-values and replication, what’s the first definition you would check? Yes, exactly. Apparently, when it came to the recently released National Academies of Science “Consensus Study” Reproducibility and Replicability in Science 2019, no one did. Continue reading

Categories: ASA Guide to P-values, Error Statistics, P-values | 20 Comments

Hardwicke and Ioannidis, Gelman, and Mayo: P-values: Petitions, Practice, and Perils (and a question for readers)

.

The October 2019 issue of the European Journal of Clinical Investigations came out today. It includes the PERSPECTIVE article by Tom Hardwicke and John Ioannidis, an invited editorial by Gelman and one by me:

Petitions in scientific argumentation: Dissecting the request to retire statistical significance, by Tom Hardwicke and John Ioannidis

When we make recommendations for scientific practice, we are (at best) acting as social scientists, by Andrew Gelman

P-value thresholds: Forfeit at your peril, by Deborah Mayo

I blogged excerpts from my preprint, and some related posts, here.

All agree to the disagreement on the statistical and metastatistical issues: Continue reading

Categories: ASA Guide to P-values, P-values, stat wars and their casualties | 16 Comments

(Excerpts from) ‘P-Value Thresholds: Forfeit at Your Peril’ (free access)

.

A key recognition among those who write on the statistical crisis in science is that the pressure to publish attention-getting articles can incentivize researchers to produce eye-catching but inadequately scrutinized claims. We may see much the same sensationalism in broadcasting metastatistical research, especially if it takes the form of scapegoating or banning statistical significance. A lot of excitement was generated recently when Ron Wasserstein, Executive Director of the American Statistical Association (ASA), and co-editors A. Schirm and N. Lazar, updated(note) the 2016 ASA Statement on P-Values and Statistical Significance (ASA I). In their 2019 interpretation, ASA I “stopped just short of recommending that declarations of ‘statistical significance’ be abandoned,” and in their new statement (ASA II) announced: “We take that step here….’statistically significant’ –don’t say it and don’t use it”. To herald the ASA II(note), and the special issue “Moving to a world beyond ‘p < 0.05’”, the journal Nature requisitioned a commentary from Amrhein, Greenland and McShane “Retire Statistical Significance” (AGM). With over 800 signatories, the commentary received the imposing title “Scientists rise up against significance tests”! Continue reading

Categories: ASA Guide to P-values, P-values, stat wars and their casualties | 6 Comments

Gelman blogged our exchange on abandoning statistical significance

A. Gelman

I came across this post on Gelman’s blog today:

Exchange with Deborah Mayo on abandoning statistical significance

It was straight out of blog comments and email correspondence back when the ASA, and significant others, were rising up against the concept of statistical significance. Here it is: Continue reading

Categories: Gelman blogs an exchange with Mayo | Tags: | 7 Comments

All She Wrote (so far): Error Statistics Philosophy: 8 years on

.

Error Statistics Philosophy: Blog Contents (8 years)
By: D. G. Mayo

Dear Reader: I began this blog 8 years ago (Sept. 3, 2011)! A double celebration is taking place at the Elbar Room Friday evening (a smaller one was held earlier in the week), both for the blog and the 1 year anniversary of the physical appearance of my book: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars [SIST] (CUP). A special rush edition made an appearance on Sept 3, 2018 in time for the RSS meeting in Cardiff. If you’re in the neighborhood, stop by for some Elba Grease.

Ship Statinfasst made its most recent journey at the Summer Seminar for Phil Stat from July 28-Aug 11, co-directed with Aris Spanos. It was one of the main events that occupied my time the past academic year, from the planning, advertising and running. We had 15 fantastic faculty and post-doc participants (from 55 applicants), and plan to continue the movement to incorporate PhilStat in philosophy and methodology, both in teaching and research. You can find slides from the Seminar (zoom videos, including those of special invited speakers, to come) on SummerSeminarPhilStat.com. Slides and other materials from the Spring Seminar co-taught with Aris Spanos (and cross-listed with Economics) can be found on this blog here

Continue reading

Categories: 8 year memory lane, blog contents, Metablog | 3 Comments

(one year ago) RSS 2018 – Significance Tests: Rethinking the Controversy

.

Here’s what I posted 1 year ago on Aug 30, 2018.

 

Day 2, Wednesday 05/09/2018

11:20 – 13:20

Keynote 4 – Significance Tests: Rethinking the Controversy Assembly Room

Speakers:
Sir David Cox, Nuffield College, Oxford
Deborah Mayo, Virginia Tech
Richard Morey, Cardiff University
Aris Spanos, Virginia Tech

Intermingled in today’s statistical controversies are some long-standing, but unresolved, disagreements on the nature and principles of statistical methods and the roles for probability in statistical inference and modelling. In reaction to the so-called “replication crisis” in the sciences, some reformers suggest significance tests as a major culprit. To understand the ramifications of the proposed reforms, there is a pressing need for a deeper understanding of the source of the problems in the sciences and a balanced critique of the alternative methods being proposed to supplant significance tests. In this session speakers offer perspectives on significance tests from statistical science, econometrics, experimental psychology and philosophy of science. There will be also be panel discussion.

Categories: memory lane | Tags: | Leave a comment

Palavering about Palavering about P-values

.

Nathan Schachtman (who was a special invited speaker at our recent Summer Seminar in Phil Stat) put up a post on his law blog the other day (“Palavering About P-values”) on an article by a statistics professor at Stanford, Helena Kraemer. “Palavering” is an interesting word choice of Schachtman’s. Its range of meanings is relevant here [i]; in my title, I intend both, in turn. You can read Schachtman’s full post here, it begins like this:

The American Statistical Association’s most recent confused and confusing communication about statistical significance testing has given rise to great mischief in the world of science and science publishing.[ASA II 2019] Take for instance last week’s opinion piece about “Is It Time to Ban the P Value?” Please.

Admittedly, their recent statement, which I refer to as ASA II, has seemed to open the floodgates to some very zany remarks about P-values, their meaning and role in statistical testing. Continuing with Schachtman’s post: Continue reading

Categories: ASA Guide to P-values, P-values | Tags: | 12 Comments

Blog at WordPress.com.