Dear Reader: I began this blog 10 years ago (Sept. 3, 2011)! A double celebration is taking place at the Elbar Room–remotely for the first time due to Covid– both for the blog and the 3 year anniversary of the physical appearance of my book: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars [SIST] (CUP, 2018). A special rush edition made an appearance on Sept 3, 2018 in time for the RSS meeting in Cardiff, where we had a session deconstructing the arguments against statistical significance tests (with Sir David Cox, Richard Morey and Aris Spanos). Join us between 7 and 8 pm in a drink of Elba Grease.
Many of the discussions in the book were importantly influenced (corrected and improved) by reader’s comments on the blog over the years. I posted several excerpts and mementos from SIST here. I thank readers for their input. Readers might want to look up the topics in SIST on this blog to check out the comments, and see how ideas were developed, corrected and turned into “excursions” in SIST.
The other day I was in a practice (zoom) for a panel I’m in on how different approaches and philosophies (Frequentist, Bayesian, machine learning) might explain “why we disagree” when interpreting clinical trial data. The focus is radiation oncology. An important point of disagreement between frequentist (error statisticians) and Bayesians concerns whether and if so, how, to modify inferences in the face of a variety of selection effects, multiple testing, and stopping for interim analysis. Such multiplicities directly alter the capabilities of methods to avoid erroneously interpreting data, so the frequentist error probabilities are altered. By contrast, if an account conditions on the observed data, error probabilities drop out, and we get principles such as the stopping rule principle. My presentation included a quote from Bayarri and J. Berger (2004): Continue reading →
This is a belated birthday post for E.S. Pearson (11 August 1895-12 June, 1980). It’s basically a post from 2012 which concerns an issue of interpretation (long-run performance vs probativeness) that’s badly confused these days. Yes, i know I’ve been neglecting this blog as of late, but this topic will appear in a new guise in a post I’m writing now, to appear tomorrow.
HAPPY BELATED BIRTHDAY EGON!
Are methods based on error probabilities of use mainly to supply procedures which will not err too frequently in some long run? (performance). Or is it the other way round: that the control of long run error properties are of crucial importance for probing the causes of the data at hand? (probativeness). I say no to the former and yes to the latter. This, I think, was also the view of Egon Sharpe (E.S.) Pearson. Continue reading →
Stephen Senn Consultant Statistician Edinburgh, Scotland
It is hard to argue against the proposition that approaches to clinical research should treat not only men but also women fairly, and of course this applies also to other ways one might subdivide patients. However, agreeing to such a principle is not the same as acting on it and when one comes to consider what in practice one might do, it is far from clear what the principle ought to be. In other words, the more one thinks about implementing such a principle the less obvious it becomes as to what it is.
The latest salvo in the statistics wars comes in the form of the publication of The ASA Task Force on Statistical Significance and Replicability, appointed by past ASA president Karen Kafadar in November/December 2019. (In the ‘before times’!) Its members are:
Linda Young, (Co-Chair), Xuming He, (Co-Chair) Yoav Benjamini, Dick De Veaux, Bradley Efron, Scott Evans, Mark Glickman, Barry Graubard, Xiao-Li Meng, Vijay Nair, Nancy Reid, Stephen Stigler, Stephen Vardeman, Chris Wikle, Tommy Wright, Karen Kafadar, Ex-officio. (Kafadar 2020)
In 2019 the President of the American Statistical Association (ASA) established a task force to address concerns that a 2019 editorial in The American Statistician (an ASA journal) might be mistakenly interpreted as official ASA policy. (The 2019 editorial recommended eliminating the use of “p < 0.05” and “statistically significant” in statistical analysis.) This document is the statement of the task force… (Benjamini et al. 2021)
I’m reblogging two of my Higgs posts at the 9th anniversary of the 2012 discovery. (The first was in this post.) The following, was originally “Higgs Analysis and Statistical Flukes: part 2” (from March, 2013).
Some people say to me: “severe testing is fine for ‘sexy science’ like in high energy physics (HEP)”–as if their statistical inferences are radically different. But I maintain that this is the mode by which data are used in “uncertain” reasoning across the entire landscape of science and day-to-day learning, at least, when we’re trying to find things out  Even with high level theories, the particular problems of learning from data are tackled piecemeal, in local inferences that afford error control. Granted, this statistical philosophy differs importantly from those that view the task as assigning comparative (or absolute) degrees-of-support/belief/plausibility to propositions, models, or theories.
The Higgs discussion finds its way into Tour III in Excursion 3 of my Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP). You can read it (in proof form) here, pp. 202-217. in a section with the provocative title:
3.8 The Probability Our Results Are Statistical Fluctuations: Higgs’ Discovery
What is the message conveyed when the board of a professional association X appoints a Task Force intended to dispel the supposition that a position advanced by the Executive Director of association X does not reflect the views of association X on a topic that members of X disagree on? What it says to me is that there is a serious break-down of communication amongst the leadership and membership of that association. So while I’m extremely glad that the ASA appointed the Task Force on Statistical Significance and Replicability in 2019, I’m very sorry that the main reason it was needed was to address concerns that an editorial put forward by the ASA Executive Director (and 2 others) “might be mistakenly interpreted as official ASA policy”. The 2021 Statement of the Task Force (Benjamini et al. 2021) explains:
In 2019 the President of the American Statistical Association (ASA) established a task force to address concerns that a 2019 editorial in The American Statistician (an ASA journal) might be mistakenly interpreted as official ASA policy. (The 2019 editorial recommended eliminating the use of “p < 0.05” and “statistically significant” in statistical analysis.) This document is the statement of the task force…
I was watching Biogen’s stock (BIIB) climb over 100 points yesterday because its Alzheimer’s drug, aducanumab [brand name: Aduhelm], received surprising FDA approval. I hadn’t been following the drug at all (it’s enough to try and track some Covid treatments/vaccines). I knew only that the FDA panel had unanimously recommended not to approve it last year, and the general sentiment was that it was heading for FDA rejection yesterday. After I received an email from Geoff Stuart[i] asking what I thought, I found out a bit more. He wrote: Continue reading →
While I would agree that there are differences between Bayesian statisticians and Bayesian philosophers, those differences don’t line up with the ones drawn by Jon Williamson in his presentation to our Phil Stat Wars Forum (May 20 slides). I hope Bayesians (statisticians, or more generally, practitioners, and philosophers) will weigh in on this.
After Jon Williamson’s talk, Objective Bayesianism from a Philosophical Perspective, at the PhilStat forum on May 22, I raised some general “casualties” encountered by objective, non-subjective or default Bayesian accounts, not necessarily Williamson’s. I am pasting those remarks below, followed by some additional remarks and the video of his responses to my main kvetches.Continue reading →
Tom Sterkenburg, PhD
Munich Center for Mathematical Philosophy
Deborah G. Mayo: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars
The foundations of statistics is not a land of peace and quiet. “Tribal warfare” is perhaps putting it too strong, but it is the case that for decades now various camps and subcamps have been exchanging heated arguments about the right statistical methodology. That these skirmishes are not just an academic exercise is clear from the widespread use of statistical methods, and contemporary challenges that cry for more secure foundations: the rise of big data, the replication crisis.
I am reblogging a guest post that Aris Spanos wrote for this blog on Neyman’s birthday some years ago.
A Statistical Model as a Chance Mechanism Aris Spanos
Jerzy Neyman(April 16, 1894 – August 5, 1981), was a Polish/American statistician[i] who spent most of his professional career at the University of California, Berkeley. Neyman is best known in statistics for his pioneering contributions in framing the Neyman-Pearson (N-P) optimal theory of hypothesis testing and his theory of Confidence Intervals. (This article was first posted here.) Continue reading →
Today is Jerzy Neyman’s birthday (April 16, 1894 – August 5, 1981). I’m posting a link to a quirky paper of his that explains one of the most misunderstood of his positions–what he was opposed to in opposing the “inferential theory”. The paper is Neyman, J. (1962), ‘Two Breakthroughs in the Theory of Statistical Decision Making‘ [i] It’s chock full of ideas and arguments. “In the present paper” he tells us, “the term ‘inferential theory’…will be used to describe the attempts to solve the Bayes’ problem with a reference to confidence, beliefs, etc., through some supplementation …either a substitute a priori distribution [exemplified by the so called principle of insufficient reason] or a new measure of uncertainty” such as Fisher’s fiducial probability. It arises on p. 391 of Excursion 5 Tour III of Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP). Here’s a link to the proofs of that entire tour. If you hear Neyman rejecting “inferential accounts” you have to understand it in this very specific way: he’s rejecting “new measures of confidence or diffidence”. Here he alludes to them as “easy ways out”. He is not rejecting statistical inference in favor of behavioral performance as typically thought. Neyman always distinguished his error statistical performance conception from Bayesian and Fiducial probabilisms [ii]. The surprising twist here is semantical and the culprit is none other than…Allan Birnbaum. Yet Birnbaum gets short shrift, and no mention is made of our favorite “breakthrough” (or did I miss it?). You can find quite a lot on this blog searching Birnbaum. Continue reading →
Where do journal editors look to find someone to referee your manuscript (in the typical “double blind” review system in academic journals)? One obvious place to look is the reference list in your paper. After all, if you’ve cited them, they must know about the topic of your paper, putting them in a good position to write a useful review. The problem is that if your paper is on a topic of ardent disagreement, and you argue in favor of one side of the debates, then your reference list is likely to include those with actual or perceived conflicts of interest. After all, if someone has a strong standpoint on an issue of some controversy, and a strong interest in persuading others to accept their side, it creates an intellectual conflict of interest, if that person has power to uphold that view. Since your referee is in a position of significant power to do just that, it follows that they have a conflict of interest (COI). A lot of attention is paid to author’s conflicts of interest, but little into intellectual or ideological conflicts of interests of reviewers. At most, the concern is with the reviewer having special reasons to favor the author, usually thought to be indicated by having been a previous co-author. We’ve been talking about journal editors conflicts of interest as of late (e.g., with Mark Burgman’s presentation at the last Phil Stat Forum) and this brings to mind another one. Continue reading →