While immersed in our fast-paced, remote, NISS debate (October 15) with J. Berger and D. Trafimow, I didn’t immediately catch all that was said by my co-debaters (I will shortly post a transcript). We had all opted for no practice. But looking over the transcript, I was surprised that David Trafimow was indeed saying the answer to the question in my title is yes. Here are some excerpts from his remarks: Continue reading

# Error Statistics

## Phil Stat Forum: November 19: Stephen Senn, “Randomisation and Control in the Age of Coronavirus?”

## S. Senn: Testing Times (Guest post)

**Stephen Senn
**Consultant Statistician

Edinburgh, Scotland

**Testing Times**

# Screening for attention

There has been much comment on Twitter and other social media about testing for coronavirus and the relationship between a test being positive and the person tested having been infected. Some primitive form of Bayesian reasoning is often used to justify concern that an apparent positive may actually be falsely so, with specificity and sensitivity taking the roles of likelihoods and prevalence that of a prior distribution. This way of looking at testing dates back at least to a paper of 1959 by Ledley and Lusted[1]. However, as others[2, 3] have pointed out, there is a trap for the unwary in this, in that it is implicitly assumed that specificity and sensitivity are constant values unaffected by prevalence and it is far from obvious that this should be the case. Continue reading

## September 24: Bayes factors from all sides: who’s worried, who’s not, and why (R. Morey)

## 5 September, 2018 (w/updates) RSS 2018 – Significance Tests: Rethinking the Controversy

## Day 2, Wed 5th September, 2018:

## The 2018 Meeting of the Royal Statistical Society (Cardiff)

## 11:20 – 13:20

### Keynote 4 – Significance Tests: Rethinking the Controversy Assembly Room

**Speakers:****Sir David Cox, Nuffield College, Oxford****Deborah Mayo, Virginia Tech****Richard Morey, Cardiff University****Aris Spanos, Virginia Tech**

Intermingled in today’s statistical controversies are some long-standing, but unresolved, disagreements on the nature and principles of statistical methods and the roles for probability in statistical inference and modelling. In reaction to the so-called “replication crisis” in the sciences, some reformers suggest significance tests as a major culprit. To understand the ramifications of the proposed reforms, there is a pressing need for a deeper understanding of the source of the problems in the sciences and a balanced critique of the alternative methods being proposed to supplant significance tests. In this session speakers offer perspectives on significance tests from statistical science, econometrics, experimental psychology and philosophy of science. There will be also be panel discussion.

## Statistical Crises and Their Casualties–what are they?

What do I mean by “The Statistics Wars and Their Casualties”? It is the title of the workshop I have been organizing with Roman Frigg at the London School of Economics (CPNSS) [1], which was to have happened in June. It is now the title of a forum I am zooming on Phil Stat that I hope you will want to follow. It’s time that I explain and explore some of the key facets I have in mind with this title. Continue reading

## August 6: JSM 2020 Panel on P-values & “Statistical Significance”

**July 30** **PRACTICE** **VIDEO** for JSM talk (All materials for Practice JSM session here)

JSM 2020 Panel Flyer (PDF)

JSM online program w/panel abstract & information):

## A. Saltelli (Guest post): What can we learn from the debate on statistical significance?

**Professor Andrea Saltelli**

Centre for the Study of the Sciences and the Humanities (SVT), University of Bergen (UIB, Norway),

&

Open Evidence Research, Universitat Oberta de Catalunya (UOC), Barcelona

**What can we learn from the debate on statistical significance?**

The statistical community is in the midst of crisis whose latest convulsion is a petition to abolish the concept of significance. The problem is perhaps neither with significance, nor with statistics, but with the inconsiderate way we use numbers, and with our present approach to quantification. Unless the crisis is resolved, there will be a loss of consensus in scientific arguments, with a corresponding decline of public trust in the findings of science. Continue reading

## The First Eye-Opener: Error Probing Tools vs Logics of Evidence (Excursion 1 Tour II)

In Tour II of this first Excursion of Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (SIST, 2018, CUP), I pull back the cover on disagreements between experts charged with restoring integrity to today’s statistical practice. Some advised me to wait until later (in the book) to get to this eye-opener. Granted, the full story involves some technical issues, but after many months, I think I arrived at a way to get to the heart of things informally (with a promise of more detailed retracing of steps later on). It was too important not to reveal right away that some of the most popular “reforms” fall down on the job even with respect to our most minimal principle of evidence (you don’t have evidence for a claim if little if anything has been done to probe the ways it can be flawed). Continue reading

## National Academies of Science: Please Correct Your Definitions of P-values

If you were on a committee to highlight issues surrounding P-values and replication, what’s the first definition you would check? Yes, exactly. Apparently, when it came to the recently released National Academies of Science “Consensus Study” Reproducibility and Replicability in Science 2019, no one did. Continue reading

## Performance or Probativeness? E.S. Pearson’s Statistical Philosophy: Belated Birthday Wish

This is a belated birthday post for E.S. Pearson (11 August 1895-12 June, 1980). It’s basically a post from 2012 which concerns an issue of interpretation (long-run performance vs probativeness) that’s badly confused these days. I’ll post some Pearson items this week to mark his birthday.

**HAPPY BELATED BIRTHDAY EGON!**

Are methods based on error probabilities of use mainly to supply procedures which will not err too frequently in some long run? (*performance*). Or is it the other way round: that the control of long run error properties are of crucial importance for probing the causes of the data at hand? (*probativeness*). I say no to the former and yes to the latter. This, I think, was also the view of Egon Sharpe (E.S.) Pearson.

*Cases of Type A and Type B*

“How far then, can one go in giving precision to a philosophy of statistical inference?” (Pearson 1947, 172)

Pearson considers the rationale that might be given to N-P tests in two types of cases, A and B:

“(A) At one extreme we have the case where repeated decisions must be made on results obtained from some routine procedure…

(B) At the other is the situation where statistical tools are applied to an isolated investigation of considerable importance…?” (ibid., 170)

## Neyman: Distinguishing tests of statistical hypotheses and tests of significance might have been a lapse of someone’s pen

*I’ll continue to post Neyman-related items this week in honor of his birthday. This isn’t the only paper in which Neyman makes it clear he denies a distinction between a test of statistical hypotheses and significance tests. He and E. Pearson also discredit the myth that the former is only allowed to report pre-data, fixed error probabilities, and are justified only by dint of long-run error control. Controlling the “frequency of misdirected activities” in the midst of finding something out, or solving a problem of inquiry, on the other hand, are epistemological goals. What do you think?*

“Tests of Statistical Hypotheses and Their Use in Studies of Natural Phenomena”

by Jerzy Neyman

ABSTRACT. Contrary to ideas suggested by the title of the conference at which the present paper was presented, the author is not aware of a conceptual difference between a “test of a statistical hypothesis” and a “test of significance” and uses these terms interchangeably. A study of any serious substantive problem involves a sequence of incidents at which one is forced to pause and consider what to do next. In an effort to reduce the frequency of misdirected activities one uses statistical tests. The procedure is illustrated on two examples: (i) Le Cam’s (and associates’) study of immunotherapy of cancer and (ii) a socio-economic experiment relating to low-income homeownership problems.

I recommend, especially, the example on home ownership. Here are two snippets: Continue reading

## Neyman vs the ‘Inferential’ Probabilists

**We celebrated Jerzy Neyman’s Birthday (April 16, 1894) **last night in our seminar: here’s a pic of the cake. My entry today is a brief excerpt and a link to a paper of his that we haven’t discussed much on this blog: Neyman, J. (1962), ‘Two Breakthroughs in the Theory of Statistical Decision Making‘ [i] It’s chock full of ideas and arguments, but the one that interests me at the moment is Neyman’s conception of “his breakthrough”, in relation to a certain concept of “inference”. “In the present paper” he tells us, “the term ‘inferential theory’…will be used to describe the attempts to solve the Bayes’ problem with a reference to confidence, beliefs, etc., through some supplementation …either a substitute *a priori* distribution [exemplified by the so called principle of insufficient reason] or a new measure of uncertainty” such as Fisher’s fiducial probability. So if you hear Neyman rejecting “inferential accounts” you have to understand it in this very specific way: he’s rejecting “new measures of confidence or diffidence”. Here he alludes to them as “easy ways out”. Now Neyman always distinguishes his error statistical performance conception from Bayesian and Fiducial probabilisms [ii]. The surprising twist here is semantical and the culprit is none other than…Allan Birnbaum. Yet Birnbaum gets short shrift, and no mention is made of our favorite “breakthrough” (or did I miss it?).

Note: In this article,”attacks” on various statistical “fronts” refers to ways of attacking problems in one or another statistical research program.

**HAPPY BIRTHDAY WEEK FOR NEYMAN!** Continue reading

## Several reviews of Deborah Mayo’s new book, Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars « Statistical Modeling, Causal Inference, and Social Science

## Excursion 1 Tour II: Error Probing Tools versus Logics of Evidence-Excerpt

For the first time, I’m excerpting all of Excursion 1 Tour II from SIST (2018, CUP).

**1.4 The Law of Likelihood and Error Statistics**

If you want to understand what’s true about statistical inference, you should begin with what has long been a holy grail–to use probability to arrive at a type of logic of evidential support–and in the first instance you should look not at full-blown Bayesian probabilism, but at comparative accounts that sidestep prior probabilities in hypotheses. An intuitively plausible logic of comparative support was given by the philosopher Ian Hacking (1965)–the Law of Likelihood. Fortunately, the Museum of Statistics is organized by theme, and the Law of Likelihood and the related Likelihood Principle is a big one. Continue reading

## American Phil Assoc Blog: The Stat Crisis of Science: Where are the Philosophers?

## The Statistical Crisis of Science: Where are the Philosophers?

This was published today on the American Philosophical Association blog.

“[C]onfusion about the foundations of the subject is responsible, in my opinion, for much of the misuse of the statistics that one meets in fields of application such as medicine, psychology, sociology, economics, and so forth.” (George Barnard 1985, p. 2)

“Relevant clarifications of the nature and roles of statistical evidence in scientific research may well be achieved by bringing to bear in systematic concert the scholarly methods of statisticians, philosophers and historians of science, and substantive scientists…” (Allan Birnbaum 1972, p. 861).

“In the training program for PhD students, the relevant basic principles of philosophy of science, methodology, ethics and statistics that enable the responsible practice of science must be covered.” (p. 57, Committee Investigating fraudulent research practices of social psychologist Diederik Stapel)

I was the lone philosophical observer at a special meeting convened by the American Statistical Association (ASA) in 2015 to construct a non-technical document to guide users of statistical significance tests–one of the most common methods used to distinguish genuine effects from chance variability across a landscape of social, physical and biological sciences.

It was, by the ASA Director’s own description, “historical”, but it was also highly philosophical, and its ramifications are only now being discussed and debated. Today, introspection on statistical methods is rather common due to the “statistical crisis in science”. What is it? In a nutshell: high powered computer methods make it easy to arrive at impressive-looking ‘findings’ that too often disappear when others try to replicate them when hypotheses and data analysis protocols are required to be fixed in advance.

## Little Bit of Logic (5 mini problems for the reader)

**Little bit of logic (5 little problems for you)[i]**

**Deductively valid** arguments can readily have false conclusions! Yes, deductively valid arguments allow drawing their conclusions with 100% reliability but **only if** all their premises are true. For an argument to be deductively valid means simply that **if** the premises of the argument are all true, then the conclusion is true. For a valid argument to entail the truth of its conclusion, all of its premises must be true. In that case the argument is said to be (deductively) **sound**.

Equivalently, using the definition of deductive validity that I prefer: A deductively valid argument is one where, the truth of all its premises together with the falsity of its conclusion, leads to a logical contradiction (A & ~A).

Show that an argument with the form of **disjunctive syllogism** can have a false conclusion. Such an argument take the form (where A, B are statements): Continue reading

## Mayo-Spanos Summer Seminar PhilStat: July 28-Aug 11, 2019: Instructions for Applying Now Available

## INSTRUCTIONS FOR APPLYING ARE NOW AVAILABLE

### See the Blog at SummerSeminarPhilStat

## You Should Be Binge Reading the (Strong) Likelihood Principle

An essential component of inference based on familiar frequentist notions: p-values, significance and confidence levels, is the relevant sampling distribution (hence the term *sampling theory, *or my preferred *error statistics, *as we get error probabilities from the sampling distribution). This feature results in violations of a principle known as the *strong likelihood principle* (SLP). To state the SLP roughly, it asserts that all the evidential import in the data (for parametric inference within a model) resides in the likelihoods. If accepted, it would render error probabilities irrelevant post data.

**SLP** (We often drop the “strong” and just call it the LP. The “weak” LP just boils down to sufficiency)

For any two experiments E

_{1}and E_{2}with different probability models f_{1}, f_{2}, but with the same unknown parameter θ, if outcomesx* andy* (from E_{1}and E_{2}respectively) determine the same (i.e., proportional) likelihood function (f_{1}(x*; θ) = cf_{2}(y*; θ) for all θ), thenx* andy* are inferentially equivalent (for an inference about θ).

(What differentiates the weak and the strong LP is that the weak refers to a single experiment.)

Continue reading