Too little, too late? The “Don’t say significance…” editorial gets a disclaimer (ii)

.

Someone sent me an email the other day telling me that a disclaimer had been added to the editorial written by the ASA Executive Director and 2 co-authors (Wasserstein et al., 2019) (“Moving to a world beyond ‘p < 0.05′”). It reads:

 

The editorial was written by the three editors acting as individuals and reflects their scientific views not an an endorsed position of the American Statistical Association.

Continue reading

Categories: ASA Guide to P-values, ASA Task Force on Significance and Replicability, editorial COIs, WSL 2019 | 19 Comments

Philosophy of socially aware data science conference

I’ll be speaking at this conference in Philly tomorrow. My slides are also below.

 

PDF of my slides: Statistical “Reforms”: Fixing Science or Threats to Replication and Falsification. Continue reading

Categories: Announcement, Philosophy of Statistics, socially aware data science | Leave a comment

D. Mayo & D. Hand: “Statistical significance and its critics: practicing damaging science, or damaging scientific practice?”

.

Prof. Deborah Mayo, Emerita
Department of Philosophy
Virginia Tech

.

Prof. David Hand
Department of Mathematics Statistics
Imperial College London

Statistical significance and its critics: practicing damaging science, or damaging scientific practice?  (Synthese)

[pdf of full paper.] Continue reading

Categories: Error Statistics | 3 Comments

Paul Daniell & Yu-li Ko commentaries on Mayo’s ConBio Editorial

I had been posting commentaries daily from January 6, 2022 (on my editorial “The Statistics Wars and Intellectual conflicts of Interest”, Conservation Biology) until Sir David Cox died on January 18, at which point I switched to some memorial items. These two commentaries from what Daniell calls my ‘birthday festschrift’ were left out, and I put them up now. (Links to others are below.)

Continue reading

Categories: Mayo editorial, stat wars and their casualties | 1 Comment

3 Commentaries on my Editorial are being published in Conservation Biology

 

 

There are 3 commentaries soon to be published in Conservation Biology on my editorial, “The statistics wars and intellectual conflicts of interest” also published in Conservation Biology. Continue reading

Categories: Mayo editorial, significance tests | Tags: , , , , | Leave a comment

A statistically significant result indicates H’ (μ > μ’) when POW(μ’) is low (not the other way round)–but don’t ignore the standard error

.

1. New monsters. One of the bizarre facts of life in the statistics wars is that a method from one school may be criticized on grounds that it conflicts with a conception that is the reverse of what that school intends. How is that even to be deciphered? That was the difficult task I set for myself in writing Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (CUP, 2008) [SIST 2018]. I thought I was done, but new monsters keep appearing. In some cases, rather than see how the notion of severity gets us beyond fallacies, misconstruals are taken to criticize severity! So, for example, in the last couple of posts, here and here, I deciphered some of the better known power howlers (discussed in SIST Ex 5 Tour II) I’m linking to all of this tour (in proofs). Continue reading

Categories: power, reforming the reformers, SIST, Statistical Inference as Severe Testing | 16 Comments

Do “underpowered” tests “exaggerate” population effects? (iv)

.

You will often hear that if you reach a just statistically significant result “and the discovery study is underpowered, the observed effects are expected to be inflated” (Ioannidis 2008, p. 64), or “exaggerated” (Gelman and Carlin 2014). This connects to what I’m referring to as the second set of concerns about statistical significance tests, power and magnitude errors. Here, the problem does not revolve around erroneously interpreting power as a posterior probability, as we saw in the fallacy in this post. But there are other points of conflict with the error statistical tester, and much that cries out for clarification — else you will misunderstand the consequences of some of today’s reforms.. Continue reading

Categories: power, reforming the reformers, SIST, Statistical Inference as Severe Testing | 16 Comments

Join me in reforming the “reformers” of statistical significance tests

.

The most surprising discovery about today’s statistics wars is that some who set out shingles as “statistical reformers” themselves are guilty of misdefining some of the basic concepts of error statistical tests—notably power. (See my recent post on power howlers.) A major purpose of my Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP) is to clarify basic notions to get beyond what I call “chestnuts” and “howlers” of tests. The only way that disputing tribes can get beyond the statistics wars is by (at least) understanding correctly the central concepts. But these misunderstandings are more common than ever, so I’m asking readers to help. Why are they more common (than before the “new reformers” of the last decade)? I suspect that at least one reason is the popularity of Bayesian variants on tests: if one is looking to find posterior probabilities of hypotheses, then error statistical ingredients may tend to look as if that’s what they supply.  Continue reading

Categories: power, SIST, statistical significance tests | Tags: , , | 2 Comments

Happy Birthday Neyman: What was Neyman opposing when he opposed the ‘Inferential’ Probabilists? Your weekend Phil Stat reading

.

Today is Jerzy Neyman’s birthday (April 16, 1894 – August 5, 1981). I’m reposting a link to a quirky, but fascinating, paper of his that explains one of the most misunderstood of his positions–what he was opposed to in opposing the “inferential theory”. The paper, fro 60 years ago,Neyman, J. (1962), ‘Two Breakthroughs in the Theory of Statistical Decision Making‘ [i] It’s chock full of ideas and arguments. “In the present paper” he tells us, “the term ‘inferential theory’…will be used to describe the attempts to solve the Bayes’ problem with a reference to confidence, beliefs, etc., through some supplementation …either a substitute a priori distribution [exemplified by the so called principle of insufficient reason] or a new measure of uncertainty” such as Fisher’s fiducial probability. It arises on p. 391 of Excursion 5 Tour III of Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP). Here’s a link to the proofs of that entire tour. If you hear Neyman rejecting “inferential accounts,” you have to understand it in this very specific way: he’s rejecting “new measures of confidence or diffidence”. Here he alludes to them as “easy ways out”. He is not rejecting statistical inference in favor of behavioral performance as is typically thought. It’s amazing how an idiosyncratic use of a word 60 years ago can cause major rumblings decades later. Neyman always distinguished his error statistical performance conception from Bayesian and Fiducial probabilisms [ii]. The surprising twist here is semantical and the culprit is none other than…Allan Birnbaum. Yet Birnbaum gets short shrift, and no mention is made of our favorite “breakthrough” (or did I miss it?). You can find quite a lot on this blog searching Birnbaum. Continue reading

Categories: Bayesian/frequentist, Neyman | Leave a comment

Power howlers return as criticisms of severity

Mayo bangs head

Suppose you are reading about a statistically significant result x that just reaches a threshold p-value α from a test T+ of the mean of a Normal distribution

 H0: µ ≤  0 against H1: µ >  0

with n iid samples, and (for simplicity) known σ.  The test “rejects” H0 at this level & infers evidence of a discrepancy in the direction of H1.

I have heard some people say:

A. If the test’s power to detect alternative µ’ is very low, then the just statistically significant x is poor evidence of a discrepancy (from the null) corresponding to µ’.  (i.e., there’s poor evidence that  µ > µ’ ). See point* on language in notes.

They will generally also hold that if POW(µ’) is reasonably high (at least .5), then the inference to µ > µ’ is warranted, or at least not problematic.

I have heard other people say:

B. If the test’s power to detect alternative µ’ is very low, then the just statistically significant x is good evidence of a discrepancy (from the null) corresponding to µ’ (i.e., there’s good evidence that  µ > µ’).

They will generally also hold that if POW(µ’) is reasonably high (at least .5), then the inference to µ > µ’ is unwarranted.

Which is correct, from the perspective of the frequentist error statistical philosophy? Continue reading

Categories: Statistical power, statistical tests | Tags: , , , , | 7 Comments

Insevere Tests of Severe Testing (iv)

.

One does not have evidence for a claim if little if anything has been done to rule out ways the claim may be false. The claim may be said to “pass” the test, but it’s one that utterly lacks stringency or severity. On the basis of this very simple principle, I build a notion of evidence that applies to any error prone inference. In this account, data x are evidence for a claim C only if (and only to the extent that) C has passed a severe test with x.[1] How to apply this simple idea, however, and how to use it to solve central problems of induction and statistical inference requires careful consideration of how it is to be fleshed out. (See this post on strong vs weak severity.) Continue reading

Categories: Error Statistics | 2 Comments

No fooling: The Statistics Wars and Their Casualties Workshop is Postponed to 22-23 September, 2022

The Statistics Wars
and Their Casualties

Postponed to
22-23 September 2022

 

London School of Economics (CPNSS)

Yoav Benjamini (Tel Aviv University), Alexander Bird (University of Cambridge), Mark Burgman (Imperial College London),
Daniele Fanelli (London School of Economics and Political Science), Roman Frigg (London School of Economics and Political Science), Stephen Guttinger (University of Exeter), David Hand (Imperial College London), Margherita Harris (London School of Economics and Political Science), Christian Hennig (University of Bologna), Katrin Hohl *(City University London),
Daniël Lakens (Eindhoven University of Technology), Deborah Mayo (Virginia Tech), Richard Morey (Cardiff University), Stephen Senn (Edinburgh, Scotland), Jon Williamson (University of Kent) Continue reading

Categories: Error Statistics | Leave a comment

The AI/ML Wars: “explain” or test black box models?

.

I’ve been reading about the artificial intelligence/machine learning (AI/ML) wars revolving around the use of so-called “black-box” algorithms–too complex for humans, even their inventors, to understand. Such algorithms are increasingly used to make decisions that affect you, but if you can’t understand, or aren’t told, why a machine predicted your graduate-school readiness, or which drug a doctor should prescribe for you, etc, you’d likely be dissatisfied and want some kind of explanation. Being told the machine is highly accurate (in some predictive sense) wouldn’t suffice. A new AI field has grown up around the goal of developing (secondary) “white box” models to “explain” the workings of the (primary) black box model. Some call this explainable AI, or XAI. The black box is still used to reach predictions or decisions, but the explainable model is supposed to help explain why the output was reached. (The EU and DARPA in the U.S. have instituted broad requirements and programs for XAI.) Continue reading

Categories: machine learning, XAI/ML | 15 Comments

Philosophy of Science Association (PSA) 22 Call for Contributed Papers

PSA2022: Call for Contributed Papers

https://psa2022.dryfta.com/

Twenty-Eighth Biennial Meeting of the Philosophy of Science Association
November 10 – November 13, 2022
Pittsburgh, Pennsylvania

 

Submissions open on March 9, 2022 for contributed papers to be presented at the PSA2022 meeting in Pittsburgh, Pennsylvania, on November 10-13, 2022. The deadline for submitting a paper is 11:59 PM Pacific Standard Time on April 6, 2022. 

Contributed papers may be on any topic in the philosophy of science. The PSA2022 Program Committee is committed to assembling a program with high-quality papers on a variety of topics and diverse presenters that reflects the full range of current work in the philosophy of science. Continue reading

Categories: Announcement | Leave a comment

January 11 Forum: “Statistical Significance Test Anxiety” : Benjamini, Mayo, Hand

Here are all the slides along with the video from the 11 January Phil Stat Forum with speakers: Deborah G. Mayo, Yoav Benjamini and moderator/discussant David Hand.

D. Mayo                 Y. Benjamini.           D. Hand

Continue reading

Categories: ASA Guide to P-values, ASA Task Force on Significance and Replicability, P-values, statistical significance | 2 Comments

Can’t Take the Fiducial Out of Fisher (if you want to understand the N-P performance philosophy) [i]

imgres

R.A. Fisher: February 17, 1890 – July 29, 1962

Continuing with posts in recognition of R.A. Fisher’s birthday, I reblog (with a few new comments) one from a few years ago on a topic that had previously not been discussed on this blog: Fisher’s fiducial probability

[Neyman and Pearson] “began an influential collaboration initially designed primarily, it would seem to clarify Fisher’s writing. This led to their theory of testing hypotheses and to Neyman’s development of confidence intervals, aiming to clarify Fisher’s idea of fiducial intervals (D.R.Cox, 2006, p. 195).

Continue reading

Categories: fiducial probability, Fisher, Phil6334/ Econ 6614, Statistics | Leave a comment

R.A. Fisher: “Statistical methods and Scientific Induction” with replies by Neyman and E.S. Pearson

17 Feb 1890-29 July 1962

In recognition of Fisher’s birthday (Feb 17), I reblog what I call the “Triad”–an exchange between  Fisher, Neyman and Pearson (N-P) a full 20 years after the Fisher-Neyman break-up–adding a few new introductory remarks here. While my favorite is still the reply by E.S. Pearson, which alone should have shattered Fisher’s allegations that N-P “reinterpret” tests of significance as “some kind of acceptance procedure”, they are all chock full of gems for different reasons. They are short and worth rereading. Neyman’s article pulls back the cover on what is really behind Fisher’s over-the-top polemics, what with Russian 5-year plans and commercialism in the U.S. Not only is Fisher jealous that N-P tests came to overshadow “his” tests, he is furious at Neyman for driving home the fact that Fisher’s fiducial approach had been shown to be inconsistent (by others). The flaw is glaring and is illustrated very simply by Neyman in his portion of the triad. Further details may be found in my book, SIST (2018) especially pp 388-392 linked to here. It speaks to a common fallacy seen every day in interpreting confidence intervals. As for Neyman’s “behaviorism”, Pearson’s last sentence is revealing. Continue reading

Categories: E.S. Pearson, Fisher, Neyman, phil/history of stat | Leave a comment

Happy Birthday R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’

17 February 1890–29 July 1962

Today is R.A. Fisher’s birthday. I’ll reblog some Fisherian items this week with a few new remarks. This paper comes just before the conflicts with Neyman and Pearson (N-P) erupted.  Fisher links his tests and sufficiency, to the Neyman and Pearson lemma in terms of power. It’s as if we may see Fisher and N-P as ending up in a similar place while starting from different origins, as David Cox might say [1]. Unfortunately, the blow-up that occurred soon after is behind today’s misdirected war vs statistical significance tests.* I quote just the most relevant portions…the full article is linked below.** Happy Birthday Fisher! Continue reading

Categories: Fisher, phil/history of stat | Tags: , , , | Leave a comment

“Should Science Abandon Statistical Significance?” Session at AAAS Annual Meeting, Feb 18

Karen Kafadar, Yoav Benjamini, and Donald Macnaughton will be in a session:

Should Science Abandon Statistical Significance?

Friday, Feb 18 from 2-2:45 PM (EST) at the AAAS 2022 annual meeting.

The general program is here. To register*, go to this page.

Synopsis

The concept of statistical significance is central in scientific research. However, the concept is often poorly understood and thus is often unfairly criticized. This presentation includes three independent but overlapping arguments about the usefulness of the concept of statistical significance to reliably detect “effects” in frontline scientific research data. We illustrate the arguments with examples of scientific importance from genomics, physics, and medicine. We explain how the concept of statistical significance provides a cost-efficient objective way to empower scientific research with evidence.

Papers Continue reading

Categories: AAAS, Announcement, statistical significance | Tags: | Leave a comment

January 11 PhilStat Forum: Mayo: “The Stat Wars and Intellectual Conflicts of Interest”

Here are my slides on my Editorial in Conservation Biology: “The Statistics Wars and Intellectual Conflicts of Interest” Mayo (2021)  presented at  the 11 January Phil Stat Forum with speakers: Deborah G. Mayo and Yoav Benjamini and moderator David Hand. (Benjamini’s slides & full Video to come shortly)

D. Mayo                 Y. Benjamini.           D. Hand

     SLIDE SHOW:

           

For more details on the focus and background readings see this post on the Phil Stat Forum blog or this post January 10 post.

Categories: editors | Tags: , , | Leave a comment

Blog at WordPress.com.