Happy Birthday Sir David R. Cox

15 July 1924-18 January 2022

HAPPY BIRTHDAY SIR DAVID COX!  Today is David Cox’s birthday, he would have been 98 years old today. Below is a remembrance I contributed to Significance when he died, with a link to others in that same issue.

“In celebrating Cox’s immense contributions, we should recognise how much there is yet to learn from him” Continue reading

Categories: Sir David Cox | 1 Comment

10 years after the July 4 statistical discovery of the the Higgs & the value of negative results

Higgs

Today marks a decade since the discovery on July 4, 2012 of evidence for a Higgs particle based on a “5 sigma observed effect”. CERN celebrated with a scientific symposium (webcast here). The observed effect refers to the number of excess events of a given type that are “observed” in comparison to the number that would be expected from background alone—which they can simulate in particle detectors. Because the 5-sigma standard refers to a benchmark from frequentist significance testing, the discovery was immediately imbued with controversies that, at bottom, concerned statistical philosophy. Continue reading

Categories: Error Statistics | 2 Comments

D. Lakens responds to confidence interval crusading journal editors

.

In what began as a guest commentary on my 2021 editorial in Conservation Biology, Daniël Lakens recently published a response to a recommendation against using null hypothesis significance tests by journal editors from the International Society of Physiotherapy Journal. Here are some excerpts from his full article, replies (‘response to Lakens‘), links and a few comments of my own. Continue reading

Categories: stat wars and their casualties, statistical significance tests | 12 Comments

Dissent

 

Continue reading

Categories: Error Statistics | 5 Comments

Too little, too late? The “Don’t say significance…” editorial gets a disclaimer (ii)

.

Someone sent me an email the other day telling me that a disclaimer had been added to the editorial written by the ASA Executive Director and 2 co-authors (Wasserstein et al., 2019) (“Moving to a world beyond ‘p < 0.05′”). It reads:

 

The editorial was written by the three editors acting as individuals and reflects their scientific views not an an endorsed position of the American Statistical Association.

Continue reading

Categories: ASA Guide to P-values, ASA Task Force on Significance and Replicability, editorial COIs, WSL 2019 | 20 Comments

Philosophy of socially aware data science conference

I’ll be speaking at this conference in Philly tomorrow. My slides are also below.

 

PDF of my slides: Statistical “Reforms”: Fixing Science or Threats to Replication and Falsification. Continue reading

Categories: Announcement, Philosophy of Statistics, socially aware data science | Leave a comment

D. Mayo & D. Hand: “Statistical significance and its critics: practicing damaging science, or damaging scientific practice?”

.

Prof. Deborah Mayo, Emerita
Department of Philosophy
Virginia Tech

.

Prof. David Hand
Department of Mathematics Statistics
Imperial College London

Statistical significance and its critics: practicing damaging science, or damaging scientific practice?  (Synthese)

[pdf of full paper.] Continue reading

Categories: Error Statistics | 3 Comments

Paul Daniell & Yu-li Ko commentaries on Mayo’s ConBio Editorial

I had been posting commentaries daily from January 6, 2022 (on my editorial “The Statistics Wars and Intellectual conflicts of Interest”, Conservation Biology) until Sir David Cox died on January 18, at which point I switched to some memorial items. These two commentaries from what Daniell calls my ‘birthday festschrift’ were left out, and I put them up now. (Links to others are below.)

Continue reading

Categories: Mayo editorial, stat wars and their casualties | 1 Comment

3 Commentaries on my Editorial are being published in Conservation Biology

 

 

There are 3 commentaries soon to be published in Conservation Biology on my editorial, “The statistics wars and intellectual conflicts of interest” also published in Conservation Biology. Continue reading

Categories: Mayo editorial, significance tests | Tags: , , , , | Leave a comment

A statistically significant result indicates H’ (μ > μ’) when POW(μ’) is low (not the other way round)–but don’t ignore the standard error

.

1. New monsters. One of the bizarre facts of life in the statistics wars is that a method from one school may be criticized on grounds that it conflicts with a conception that is the reverse of what that school intends. How is that even to be deciphered? That was the difficult task I set for myself in writing Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (CUP, 2008) [SIST 2018]. I thought I was done, but new monsters keep appearing. In some cases, rather than see how the notion of severity gets us beyond fallacies, misconstruals are taken to criticize severity! So, for example, in the last couple of posts, here and here, I deciphered some of the better known power howlers (discussed in SIST Ex 5 Tour II) I’m linking to all of this tour (in proofs). Continue reading

Categories: power, reforming the reformers, SIST, Statistical Inference as Severe Testing | 16 Comments

Do “underpowered” tests “exaggerate” population effects? (iv)

.

You will often hear that if you reach a just statistically significant result “and the discovery study is underpowered, the observed effects are expected to be inflated” (Ioannidis 2008, p. 64), or “exaggerated” (Gelman and Carlin 2014). This connects to what I’m referring to as the second set of concerns about statistical significance tests, power and magnitude errors. Here, the problem does not revolve around erroneously interpreting power as a posterior probability, as we saw in the fallacy in this post. But there are other points of conflict with the error statistical tester, and much that cries out for clarification — else you will misunderstand the consequences of some of today’s reforms.. Continue reading

Categories: power, reforming the reformers, SIST, Statistical Inference as Severe Testing | 17 Comments

Join me in reforming the “reformers” of statistical significance tests

.

The most surprising discovery about today’s statistics wars is that some who set out shingles as “statistical reformers” themselves are guilty of misdefining some of the basic concepts of error statistical tests—notably power. (See my recent post on power howlers.) A major purpose of my Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP) is to clarify basic notions to get beyond what I call “chestnuts” and “howlers” of tests. The only way that disputing tribes can get beyond the statistics wars is by (at least) understanding correctly the central concepts. But these misunderstandings are more common than ever, so I’m asking readers to help. Why are they more common (than before the “new reformers” of the last decade)? I suspect that at least one reason is the popularity of Bayesian variants on tests: if one is looking to find posterior probabilities of hypotheses, then error statistical ingredients may tend to look as if that’s what they supply.  Continue reading

Categories: power, SIST, statistical significance tests | Tags: , , | 2 Comments

Happy Birthday Neyman: What was Neyman opposing when he opposed the ‘Inferential’ Probabilists? Your weekend Phil Stat reading

.

Today is Jerzy Neyman’s birthday (April 16, 1894 – August 5, 1981). I’m reposting a link to a quirky, but fascinating, paper of his that explains one of the most misunderstood of his positions–what he was opposed to in opposing the “inferential theory”. The paper, fro 60 years ago,Neyman, J. (1962), ‘Two Breakthroughs in the Theory of Statistical Decision Making‘ [i] It’s chock full of ideas and arguments. “In the present paper” he tells us, “the term ‘inferential theory’…will be used to describe the attempts to solve the Bayes’ problem with a reference to confidence, beliefs, etc., through some supplementation …either a substitute a priori distribution [exemplified by the so called principle of insufficient reason] or a new measure of uncertainty” such as Fisher’s fiducial probability. It arises on p. 391 of Excursion 5 Tour III of Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP). Here’s a link to the proofs of that entire tour. If you hear Neyman rejecting “inferential accounts,” you have to understand it in this very specific way: he’s rejecting “new measures of confidence or diffidence”. Here he alludes to them as “easy ways out”. He is not rejecting statistical inference in favor of behavioral performance as is typically thought. It’s amazing how an idiosyncratic use of a word 60 years ago can cause major rumblings decades later. Neyman always distinguished his error statistical performance conception from Bayesian and Fiducial probabilisms [ii]. The surprising twist here is semantical and the culprit is none other than…Allan Birnbaum. Yet Birnbaum gets short shrift, and no mention is made of our favorite “breakthrough” (or did I miss it?). You can find quite a lot on this blog searching Birnbaum. Continue reading

Categories: Bayesian/frequentist, Neyman | Leave a comment

Power howlers return as criticisms of severity

Mayo bangs head

Suppose you are reading about a statistically significant result x that just reaches a threshold p-value α from a test T+ of the mean of a Normal distribution

 H0: µ ≤  0 against H1: µ >  0

with n iid samples, and (for simplicity) known σ.  The test “rejects” H0 at this level & infers evidence of a discrepancy in the direction of H1.

I have heard some people say:

A. If the test’s power to detect alternative µ’ is very low, then the just statistically significant x is poor evidence of a discrepancy (from the null) corresponding to µ’.  (i.e., there’s poor evidence that  µ > µ’ ). See point* on language in notes.

They will generally also hold that if POW(µ’) is reasonably high (at least .5), then the inference to µ > µ’ is warranted, or at least not problematic.

I have heard other people say:

B. If the test’s power to detect alternative µ’ is very low, then the just statistically significant x is good evidence of a discrepancy (from the null) corresponding to µ’ (i.e., there’s good evidence that  µ > µ’).

They will generally also hold that if POW(µ’) is reasonably high (at least .5), then the inference to µ > µ’ is unwarranted.

Which is correct, from the perspective of the frequentist error statistical philosophy? Continue reading

Categories: Statistical power, statistical tests | Tags: , , , , | 7 Comments

Insevere Tests of Severe Testing (iv)

.

One does not have evidence for a claim if little if anything has been done to rule out ways the claim may be false. The claim may be said to “pass” the test, but it’s one that utterly lacks stringency or severity. On the basis of this very simple principle, I build a notion of evidence that applies to any error prone inference. In this account, data x are evidence for a claim C only if (and only to the extent that) C has passed a severe test with x.[1] How to apply this simple idea, however, and how to use it to solve central problems of induction and statistical inference requires careful consideration of how it is to be fleshed out. (See this post on strong vs weak severity.) Continue reading

Categories: Error Statistics | 3 Comments

No fooling: The Statistics Wars and Their Casualties Workshop is Postponed to 22-23 September, 2022

The Statistics Wars
and Their Casualties

Postponed to
22-23 September 2022

 

London School of Economics (CPNSS)

Yoav Benjamini (Tel Aviv University), Alexander Bird (University of Cambridge), Mark Burgman (Imperial College London),
Daniele Fanelli (London School of Economics and Political Science), Roman Frigg (London School of Economics and Political Science), Stephen Guttinger (University of Exeter), David Hand (Imperial College London), Margherita Harris (London School of Economics and Political Science), Christian Hennig (University of Bologna), Katrin Hohl *(City University London),
Daniël Lakens (Eindhoven University of Technology), Deborah Mayo (Virginia Tech), Richard Morey (Cardiff University), Stephen Senn (Edinburgh, Scotland), Jon Williamson (University of Kent) Continue reading

Categories: Error Statistics | Leave a comment

The AI/ML Wars: “explain” or test black box models?

.

I’ve been reading about the artificial intelligence/machine learning (AI/ML) wars revolving around the use of so-called “black-box” algorithms–too complex for humans, even their inventors, to understand. Such algorithms are increasingly used to make decisions that affect you, but if you can’t understand, or aren’t told, why a machine predicted your graduate-school readiness, or which drug a doctor should prescribe for you, etc, you’d likely be dissatisfied and want some kind of explanation. Being told the machine is highly accurate (in some predictive sense) wouldn’t suffice. A new AI field has grown up around the goal of developing (secondary) “white box” models to “explain” the workings of the (primary) black box model. Some call this explainable AI, or XAI. The black box is still used to reach predictions or decisions, but the explainable model is supposed to help explain why the output was reached. (The EU and DARPA in the U.S. have instituted broad requirements and programs for XAI.) Continue reading

Categories: machine learning, XAI/ML | 15 Comments

Philosophy of Science Association (PSA) 22 Call for Contributed Papers

PSA2022: Call for Contributed Papers

https://psa2022.dryfta.com/

Twenty-Eighth Biennial Meeting of the Philosophy of Science Association
November 10 – November 13, 2022
Pittsburgh, Pennsylvania

 

Submissions open on March 9, 2022 for contributed papers to be presented at the PSA2022 meeting in Pittsburgh, Pennsylvania, on November 10-13, 2022. The deadline for submitting a paper is 11:59 PM Pacific Standard Time on April 6, 2022. 

Contributed papers may be on any topic in the philosophy of science. The PSA2022 Program Committee is committed to assembling a program with high-quality papers on a variety of topics and diverse presenters that reflects the full range of current work in the philosophy of science. Continue reading

Categories: Announcement | Leave a comment

January 11 Forum: “Statistical Significance Test Anxiety” : Benjamini, Mayo, Hand

Here are all the slides along with the video from the 11 January Phil Stat Forum with speakers: Deborah G. Mayo, Yoav Benjamini and moderator/discussant David Hand.

D. Mayo                 Y. Benjamini.           D. Hand

Continue reading

Categories: ASA Guide to P-values, ASA Task Force on Significance and Replicability, P-values, statistical significance | 2 Comments

Can’t Take the Fiducial Out of Fisher (if you want to understand the N-P performance philosophy) [i]

imgres

R.A. Fisher: February 17, 1890 – July 29, 1962

Continuing with posts in recognition of R.A. Fisher’s birthday, I reblog (with a few new comments) one from a few years ago on a topic that had previously not been discussed on this blog: Fisher’s fiducial probability

[Neyman and Pearson] “began an influential collaboration initially designed primarily, it would seem to clarify Fisher’s writing. This led to their theory of testing hypotheses and to Neyman’s development of confidence intervals, aiming to clarify Fisher’s idea of fiducial intervals (D.R.Cox, 2006, p. 195).

Continue reading

Categories: fiducial probability, Fisher, Phil6334/ Econ 6614, Statistics | Leave a comment

Blog at WordPress.com.