Error Statistics

September 24: Bayes factors from all sides: who’s worried, who’s not, and why (R. Morey)

Information and directions for joining our forum are here.


Upcoming talks will include Stephen Senn (Statistical consultant, Scotland, November 19, 2020); Deborah Mayo (Philosophy, Virginia Tech, December 19, 2020); and Alexander Bird (Philosophy, King’s College London, January 28, 2021).

In October, instead of our monthly meeting, I invite you to a P-value debate on October 15 sponsored by the National Institute of Statistical Science, with J. Berger, D. Mayo, and D. Trafimow. Register at


Categories: Announcement, bayes factors, Error Statistics, Phil Stat Forum, Richard Morey | Leave a comment

5 September, 2018 (w/updates) RSS 2018 – Significance Tests: Rethinking the Controversy


Day 2, Wed 5th September, 2018:

The 2018 Meeting of the Royal Statistical Society (Cardiff)

11:20 – 13:20

Keynote 4 – Significance Tests: Rethinking the Controversy Assembly Room

Sir David Cox, Nuffield College, Oxford
Deborah Mayo, Virginia Tech
Richard Morey, Cardiff University
Aris Spanos, Virginia Tech

Intermingled in today’s statistical controversies are some long-standing, but unresolved, disagreements on the nature and principles of statistical methods and the roles for probability in statistical inference and modelling. In reaction to the so-called “replication crisis” in the sciences, some reformers suggest significance tests as a major culprit. To understand the ramifications of the proposed reforms, there is a pressing need for a deeper understanding of the source of the problems in the sciences and a balanced critique of the alternative methods being proposed to supplant significance tests. In this session speakers offer perspectives on significance tests from statistical science, econometrics, experimental psychology and philosophy of science. There will be also be panel discussion.

5 Sept. 2018 (taken by A.Spanos)

Continue reading

Categories: Error Statistics | Tags: | Leave a comment

Statistical Crises and Their Casualties–what are they?

What do I mean by “The Statistics Wars and Their Casualties”? It is the title of the workshop I have been organizing with Roman Frigg at the London School of Economics (CPNSS) [1], which was to have happened in June. It is now the title of a forum I am zooming on Phil Stat that I hope you will want to follow. It’s time that I explain and explore some of the key facets I have in mind with this title. Continue reading

Categories: Error Statistics | 4 Comments

August 6: JSM 2020 Panel on P-values & “Statistical Significance”


July 30 PRACTICE VIDEO for JSM talk (All materials for Practice JSM session here)

JSM 2020 Panel Flyer (PDF)
JSM online program w/panel abstract & information):

Categories: ASA Guide to P-values, Error Statistics, evidence-based policy, JSM 2020, P-values, Philosophy of Statistics, science communication, significance tests | 3 Comments

Bad Statistics is Their Product: Fighting Fire With Fire (ii)

Mayo fights fire w/ fire

I. Doubt is Their Product is the title of a (2008) book by David Michaels, Assistant Secretary for OSHA from 2009-2017. I first mentioned it on this blog back in 2011 (“Will the Real Junk Science Please Stand Up?) The expression is from a statement by a cigarette executive (“doubt is our product”), and the book’s thesis is explained in its subtitle: How Industry’s Assault on Science Threatens Your Health. Imagine you have just picked up a book, published in 2020: Bad Statistics is Their Product. Is the author writing about how exaggerating bad statistics may serve in the interest of denying well-established risks? [Interpretation A]. Or perhaps she’s writing on how exaggerating bad statistics serves the interest of denying well-established statistical methods? [Interpretation B]. Both may result in distorting science and even in dismantling public health safeguards–especially if made the basis of evidence policies in agencies. A responsible philosopher of statistics should care. Continue reading

Categories: ASA Guide to P-values, Error Statistics, P-values, replication research, slides | 33 Comments

A. Saltelli (Guest post): What can we learn from the debate on statistical significance?

Professor Andrea Saltelli
Centre for the Study of the Sciences and the Humanities (SVT), University of Bergen (UIB, Norway),
Open Evidence Research, Universitat Oberta de Catalunya (UOC), Barcelona

What can we learn from the debate on statistical significance?

The statistical community is in the midst of crisis whose latest convulsion is a petition to abolish the concept of significance. The problem is perhaps neither with significance, nor with statistics, but with the inconsiderate way we use numbers, and with our present approach to quantification.  Unless the crisis is resolved, there will be a loss of consensus in scientific arguments, with a corresponding decline of public trust in the findings of science. Continue reading

Categories: Error Statistics | 11 Comments

The First Eye-Opener: Error Probing Tools vs Logics of Evidence (Excursion 1 Tour II)

1.4, 1.5

In Tour II of this first Excursion of Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (SIST, 2018, CUP),  I pull back the cover on disagreements between experts charged with restoring integrity to today’s statistical practice. Some advised me to wait until later (in the book) to get to this eye-opener. Granted, the full story involves some technical issues, but after many months, I think I arrived at a way to get to the heart of things informally (with a promise of more detailed retracing of steps later on). It was too important not to reveal right away that some of the most popular “reforms” fall down on the job even with respect to our most minimal principle of evidence (you don’t have evidence for a claim if little if anything has been done to probe the ways it can be flawed).  Continue reading

Categories: Error Statistics, law of likelihood, SIST | 14 Comments

National Academies of Science: Please Correct Your Definitions of P-values

Mayo banging head

If you were on a committee to highlight issues surrounding P-values and replication, what’s the first definition you would check? Yes, exactly. Apparently, when it came to the recently released National Academies of Science “Consensus Study” Reproducibility and Replicability in Science 2019, no one did. Continue reading

Categories: ASA Guide to P-values, Error Statistics, P-values | 20 Comments

Performance or Probativeness? E.S. Pearson’s Statistical Philosophy: Belated Birthday Wish

E.S. Pearson

This is a belated birthday post for E.S. Pearson (11 August 1895-12 June, 1980). It’s basically a post from 2012 which concerns an issue of interpretation (long-run performance vs probativeness) that’s badly confused these days. I’ll post some Pearson items this week to mark his birthday.


Are methods based on error probabilities of use mainly to supply procedures which will not err too frequently in some long run? (performance). Or is it the other way round: that the control of long run error properties are of crucial importance for probing the causes of the data at hand? (probativeness). I say no to the former and yes to the latter. This, I think, was also the view of Egon Sharpe (E.S.) Pearson. 

Cases of Type A and Type B

“How far then, can one go in giving precision to a philosophy of statistical inference?” (Pearson 1947, 172)

Pearson considers the rationale that might be given to N-P tests in two types of cases, A and B:

“(A) At one extreme we have the case where repeated decisions must be made on results obtained from some routine procedure…

(B) At the other is the situation where statistical tools are applied to an isolated investigation of considerable importance…?” (ibid., 170)

Continue reading

Categories: E.S. Pearson, Error Statistics | Leave a comment

Neyman: Distinguishing tests of statistical hypotheses and tests of significance might have been a lapse of someone’s pen

Neyman April 16, 1894 – August 5, 1981

I’ll continue to post Neyman-related items this week in honor of his birthday. This isn’t the only paper in which Neyman makes it clear he denies a distinction between a test of  statistical hypotheses and significance tests. He and E. Pearson also discredit the myth that the former is only allowed to report pre-data, fixed error probabilities, and are justified only by dint of long-run error control. Controlling the “frequency of misdirected activities” in the midst of finding something out, or solving a problem of inquiry, on the other hand, are epistemological goals. What do you think?

Tests of Statistical Hypotheses and Their Use in Studies of Natural Phenomena
by Jerzy Neyman

ABSTRACT. Contrary to ideas suggested by the title of the conference at which the present paper was presented, the author is not aware of a conceptual difference between a “test of a statistical hypothesis” and a “test of significance” and uses these terms interchangeably. A study of any serious substantive problem involves a sequence of incidents at which one is forced to pause and consider what to do next. In an effort to reduce the frequency of misdirected activities one uses statistical tests. The procedure is illustrated on two examples: (i) Le Cam’s (and associates’) study of immunotherapy of cancer and (ii) a socio-economic experiment relating to low-income homeownership problems.

I recommend, especially, the example on home ownership. Here are two snippets: Continue reading

Categories: Error Statistics, Neyman, Statistics | Tags: | Leave a comment

Neyman vs the ‘Inferential’ Probabilists


We celebrated Jerzy Neyman’s Birthday (April 16, 1894) last night in our seminar: here’s a pic of the cake.  My entry today is a brief excerpt and a link to a paper of his that we haven’t discussed much on this blog: Neyman, J. (1962), ‘Two Breakthroughs in the Theory of Statistical Decision Making‘ [i] It’s chock full of ideas and arguments, but the one that interests me at the moment is Neyman’s conception of “his breakthrough”, in relation to a certain concept of “inference”.  “In the present paper” he tells us, “the term ‘inferential theory’…will be used to describe the attempts to solve the Bayes’ problem with a reference to confidence, beliefs, etc., through some supplementation …either a substitute a priori distribution [exemplified by the so called principle of insufficient reason] or a new measure of uncertainty” such as Fisher’s fiducial probability. So if you hear Neyman rejecting “inferential accounts” you have to understand it in this very specific way: he’s rejecting “new measures of confidence or diffidence”. Here he alludes to them as “easy ways out”. Now Neyman always distinguishes his error statistical performance conception from Bayesian and Fiducial probabilisms [ii]. The surprising twist here is semantical and the culprit is none other than…Allan Birnbaum. Yet Birnbaum gets short shrift, and no mention is made of our favorite “breakthrough” (or did I miss it?).

drawn by his wife,Olga

Note: In this article,”attacks” on various statistical “fronts” refers to ways of attacking problems in one or another statistical research program.

Categories: Bayesian/frequentist, Error Statistics, Neyman | Leave a comment

Several reviews of Deborah Mayo’s new book, Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars « Statistical Modeling, Causal Inference, and Social Science

Source: Several reviews of Deborah Mayo’s new book, Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars « Statistical Modeling, Causal Inference, and Social Science

Categories: Error Statistics | Leave a comment

Excursion 1 Tour II: Error Probing Tools versus Logics of Evidence-Excerpt


For the first time, I’m excerpting all of Excursion 1 Tour II from SIST (2018, CUP).

1.4 The Law of Likelihood and Error Statistics

If you want to understand what’s true about statistical inference, you should begin with what has long been a holy grail–to use probability to arrive at a type of logic of evidential support–and in the first instance you should look not at full-blown Bayesian probabilism, but at comparative accounts that sidestep prior probabilities in hypotheses. An intuitively plausible logic of comparative support was given by the philosopher Ian Hacking (1965)–the Law of Likelihood. Fortunately, the Museum of Statistics is organized by theme, and the Law of Likelihood and the related Likelihood Principle is a big one. Continue reading

Categories: Error Statistics, law of likelihood, SIST | 2 Comments

American Phil Assoc Blog: The Stat Crisis of Science: Where are the Philosophers?

Ship StatInfasST

The Statistical Crisis of Science: Where are the Philosophers?

This was published today on the American Philosophical Association blog. 

“[C]onfusion about the foundations of the subject is responsible, in my opinion, for much of the misuse of the statistics that one meets in fields of application such as medicine, psychology, sociology, economics, and so forth.” (George Barnard 1985, p. 2)

“Relevant clarifications of the nature and roles of statistical evidence in scientific research may well be achieved by bringing to bear in systematic concert the scholarly methods of statisticians, philosophers and historians of science, and substantive scientists…” (Allan Birnbaum 1972, p. 861).

“In the training program for PhD students, the relevant basic principles of philosophy of science, methodology, ethics and statistics that enable the responsible practice of science must be covered.” (p. 57, Committee Investigating fraudulent research practices of social psychologist Diederik Stapel)

I was the lone philosophical observer at a special meeting convened by the American Statistical Association (ASA) in 2015 to construct a non-technical document to guide users of statistical significance tests–one of the most common methods used to distinguish genuine effects from chance variability across a landscape of social, physical and biological sciences.

It was, by the ASA Director’s own description, “historical”, but it was also highly philosophical, and its ramifications are only now being discussed and debated. Today, introspection on statistical methods is rather common due to the “statistical crisis in science”. What is it? In a nutshell: high powered computer methods make it easy to arrive at impressive-looking ‘findings’ that too often disappear when others try to replicate them when hypotheses and data analysis protocols are required to be fixed in advance.

Continue reading

Categories: Error Statistics, Philosophy of Statistics, Summer Seminar in PhilStat | 2 Comments

Little Bit of Logic (5 mini problems for the reader)

Little bit of logic (5 little problems for you)[i]

Deductively valid arguments can readily have false conclusions! Yes, deductively valid arguments allow drawing their conclusions with 100% reliability but only if all their premises are true. For an argument to be deductively valid means simply that if the premises of the argument are all true, then the conclusion is true. For a valid argument to entail  the truth of its conclusion, all of its premises must be true.  In that case the argument is said to be (deductively) sound.

Equivalently, using the definition of deductive validity that I prefer: A deductively valid argument is one where, the truth of all its premises together with the falsity of its conclusion, leads to a logical contradiction (A & ~A).

Show that an argument with the form of disjunctive syllogism can have a false conclusion. Such an argument take the form (where A, B are statements): Continue reading

Categories: Error Statistics | 22 Comments

Mayo-Spanos Summer Seminar PhilStat: July 28-Aug 11, 2019: Instructions for Applying Now Available


See the Blog at SummerSeminarPhilStat

Categories: Announcement, Error Statistics, Statistics | Leave a comment

You Should Be Binge Reading the (Strong) Likelihood Principle



An essential component of inference based on familiar frequentist notions: p-values, significance and confidence levels, is the relevant sampling distribution (hence the term sampling theory, or my preferred error statistics, as we get error probabilities from the sampling distribution). This feature results in violations of a principle known as the strong likelihood principle (SLP). To state the SLP roughly, it asserts that all the evidential import in the data (for parametric inference within a model) resides in the likelihoods. If accepted, it would render error probabilities irrelevant post data.

SLP (We often drop the “strong” and just call it the LP. The “weak” LP just boils down to sufficiency)

For any two experiments E1 and E2 with different probability models f1, f2, but with the same unknown parameter θ, if outcomes x* and y* (from E1 and E2 respectively) determine the same (i.e., proportional) likelihood function (f1(x*; θ) = cf2(y*; θ) for all θ), then x* and y* are inferentially equivalent (for an inference about θ).

(What differentiates the weak and the strong LP is that the weak refers to a single experiment.)
Continue reading

Categories: Error Statistics, Statistics, strong likelihood principle | 2 Comments

Excerpt from Excursion 4 Tour I: The Myth of “The Myth of Objectivity” (Mayo 2018, CUP)


Tour I The Myth of “The Myth of Objectivity”*


Objectivity in statistics, as in science more generally, is a matter of both aims and methods. Objective science, in our view, aims to find out what is the case as regards aspects of the world [that hold] independently of our beliefs, biases and interests; thus objective methods aim for the critical control of inferences and hypotheses, constraining them by evidence and checks of error. (Cox and Mayo 2010, p. 276)

Whenever you come up against blanket slogans such as “no methods are objective” or “all methods are equally objective and subjective” it is a good guess that the problem is being trivialized into oblivion. Yes, there are judgments, disagreements, and values in any human activity, which alone makes it too trivial an observation to distinguish among very different ways that threats of bias and unwarranted inferences may be controlled. Is the objectivity–subjectivity distinction really toothless, as many will have you believe? I say no. I know it’s a meme promulgated by statistical high priests, but you agreed, did you not, to use a bit of chutzpah on this excursion? Besides, cavalier attitudes toward objectivity are at odds with even more widely endorsed grass roots movements to promote replication, reproducibility, and to come clean on a number of sources behind illicit results: multiple testing, cherry picking, failed assumptions, researcher latitude, publication bias and so on. The moves to take back science are rooted in the supposition that we can more objectively scrutinize results – even if it’s only to point out those that are BENT. The fact that these terms are used equivocally should not be taken as grounds to oust them but rather to engage in the difficult work of identifying what there is in “objectivity” that we won’t give up, and shouldn’t. Continue reading

Categories: Error Statistics, SIST, Statistical Inference as Severe Testing | 4 Comments

Summer Seminar PhilStat: July 28-Aug 11, 2019 (ii)

First draft of PhilStat Announcement


Categories: Announcement, Error Statistics | 5 Comments

It’s the Methods, Stupid: Excerpt from Excursion 3 Tour II (Mayo 2018, CUP)

Tour II It’s the Methods, Stupid

There is perhaps in current literature a tendency to speak of the Neyman–Pearson contributions as some static system, rather than as part of the historical process of development of thought on statistical theory which is and will always go on. (Pearson 1962, 276)

This goes for Fisherian contributions as well. Unlike museums, we won’ t remain static. The lesson from Tour I of this Excursion is that Fisherian and Neyman– Pearsonian tests may be seen as offering clusters of methods appropriate for different contexts within the large taxonomy of statistical inquiries. There is an overarching pattern: Continue reading

Categories: Error Statistics, Statistical Inference as Severe Testing | 4 Comments

Blog at