While immersed in our fast-paced, remote, NISS debate (October 15) with J. Berger and D. Trafimow, I didn’t immediately catch all that was said by my co-debaters (I will shortly post a transcript). We had all opted for no practice. But looking over the transcript, I was surprised that David Trafimow was indeed saying the answer to the question in my title is yes. Here are some excerpts from his remarks: Continue reading
Screening for attention
There has been much comment on Twitter and other social media about testing for coronavirus and the relationship between a test being positive and the person tested having been infected. Some primitive form of Bayesian reasoning is often used to justify concern that an apparent positive may actually be falsely so, with specificity and sensitivity taking the roles of likelihoods and prevalence that of a prior distribution. This way of looking at testing dates back at least to a paper of 1959 by Ledley and Lusted. However, as others[2, 3] have pointed out, there is a trap for the unwary in this, in that it is implicitly assumed that specificity and sensitivity are constant values unaffected by prevalence and it is far from obvious that this should be the case. Continue reading
Day 2, Wed 5th September, 2018:
The 2018 Meeting of the Royal Statistical Society (Cardiff)
11:20 – 13:20
Sir David Cox, Nuffield College, Oxford
Deborah Mayo, Virginia Tech
Richard Morey, Cardiff University
Aris Spanos, Virginia Tech
Intermingled in today’s statistical controversies are some long-standing, but unresolved, disagreements on the nature and principles of statistical methods and the roles for probability in statistical inference and modelling. In reaction to the so-called “replication crisis” in the sciences, some reformers suggest significance tests as a major culprit. To understand the ramifications of the proposed reforms, there is a pressing need for a deeper understanding of the source of the problems in the sciences and a balanced critique of the alternative methods being proposed to supplant significance tests. In this session speakers offer perspectives on significance tests from statistical science, econometrics, experimental psychology and philosophy of science. There will be also be panel discussion.
What do I mean by “The Statistics Wars and Their Casualties”? It is the title of the workshop I have been organizing with Roman Frigg at the London School of Economics (CPNSS) , which was to have happened in June. It is now the title of a forum I am zooming on Phil Stat that I hope you will want to follow. It’s time that I explain and explore some of the key facets I have in mind with this title. Continue reading
July 30 PRACTICE VIDEO for JSM talk (All materials for Practice JSM session here)
I. Doubt is Their Product is the title of a (2008) book by David Michaels, Assistant Secretary for OSHA from 2009-2017. I first mentioned it on this blog back in 2011 (“Will the Real Junk Science Please Stand Up?) The expression is from a statement by a cigarette executive (“doubt is our product”), and the book’s thesis is explained in its subtitle: How Industry’s Assault on Science Threatens Your Health. Imagine you have just picked up a book, published in 2020: Bad Statistics is Their Product. Is the author writing about how exaggerating bad statistics may serve in the interest of denying well-established risks? [Interpretation A]. Or perhaps she’s writing on how exaggerating bad statistics serves the interest of denying well-established statistical methods? [Interpretation B]. Both may result in distorting science and even in dismantling public health safeguards–especially if made the basis of evidence policies in agencies. A responsible philosopher of statistics should care. Continue reading
What can we learn from the debate on statistical significance?
The statistical community is in the midst of crisis whose latest convulsion is a petition to abolish the concept of significance. The problem is perhaps neither with significance, nor with statistics, but with the inconsiderate way we use numbers, and with our present approach to quantification. Unless the crisis is resolved, there will be a loss of consensus in scientific arguments, with a corresponding decline of public trust in the findings of science. Continue reading
In Tour II of this first Excursion of Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (SIST, 2018, CUP), I pull back the cover on disagreements between experts charged with restoring integrity to today’s statistical practice. Some advised me to wait until later (in the book) to get to this eye-opener. Granted, the full story involves some technical issues, but after many months, I think I arrived at a way to get to the heart of things informally (with a promise of more detailed retracing of steps later on). It was too important not to reveal right away that some of the most popular “reforms” fall down on the job even with respect to our most minimal principle of evidence (you don’t have evidence for a claim if little if anything has been done to probe the ways it can be flawed). Continue reading
If you were on a committee to highlight issues surrounding P-values and replication, what’s the first definition you would check? Yes, exactly. Apparently, when it came to the recently released National Academies of Science “Consensus Study” Reproducibility and Replicability in Science 2019, no one did. Continue reading
This is a belated birthday post for E.S. Pearson (11 August 1895-12 June, 1980). It’s basically a post from 2012 which concerns an issue of interpretation (long-run performance vs probativeness) that’s badly confused these days. I’ll post some Pearson items this week to mark his birthday.
HAPPY BELATED BIRTHDAY EGON!
Are methods based on error probabilities of use mainly to supply procedures which will not err too frequently in some long run? (performance). Or is it the other way round: that the control of long run error properties are of crucial importance for probing the causes of the data at hand? (probativeness). I say no to the former and yes to the latter. This, I think, was also the view of Egon Sharpe (E.S.) Pearson.
Cases of Type A and Type B
“How far then, can one go in giving precision to a philosophy of statistical inference?” (Pearson 1947, 172)
Pearson considers the rationale that might be given to N-P tests in two types of cases, A and B:
“(A) At one extreme we have the case where repeated decisions must be made on results obtained from some routine procedure…
(B) At the other is the situation where statistical tools are applied to an isolated investigation of considerable importance…?” (ibid., 170)
Neyman: Distinguishing tests of statistical hypotheses and tests of significance might have been a lapse of someone’s pen
I’ll continue to post Neyman-related items this week in honor of his birthday. This isn’t the only paper in which Neyman makes it clear he denies a distinction between a test of statistical hypotheses and significance tests. He and E. Pearson also discredit the myth that the former is only allowed to report pre-data, fixed error probabilities, and are justified only by dint of long-run error control. Controlling the “frequency of misdirected activities” in the midst of finding something out, or solving a problem of inquiry, on the other hand, are epistemological goals. What do you think?
ABSTRACT. Contrary to ideas suggested by the title of the conference at which the present paper was presented, the author is not aware of a conceptual difference between a “test of a statistical hypothesis” and a “test of significance” and uses these terms interchangeably. A study of any serious substantive problem involves a sequence of incidents at which one is forced to pause and consider what to do next. In an effort to reduce the frequency of misdirected activities one uses statistical tests. The procedure is illustrated on two examples: (i) Le Cam’s (and associates’) study of immunotherapy of cancer and (ii) a socio-economic experiment relating to low-income homeownership problems.
I recommend, especially, the example on home ownership. Here are two snippets: Continue reading
We celebrated Jerzy Neyman’s Birthday (April 16, 1894) last night in our seminar: here’s a pic of the cake. My entry today is a brief excerpt and a link to a paper of his that we haven’t discussed much on this blog: Neyman, J. (1962), ‘Two Breakthroughs in the Theory of Statistical Decision Making‘ [i] It’s chock full of ideas and arguments, but the one that interests me at the moment is Neyman’s conception of “his breakthrough”, in relation to a certain concept of “inference”. “In the present paper” he tells us, “the term ‘inferential theory’…will be used to describe the attempts to solve the Bayes’ problem with a reference to confidence, beliefs, etc., through some supplementation …either a substitute a priori distribution [exemplified by the so called principle of insufficient reason] or a new measure of uncertainty” such as Fisher’s fiducial probability. So if you hear Neyman rejecting “inferential accounts” you have to understand it in this very specific way: he’s rejecting “new measures of confidence or diffidence”. Here he alludes to them as “easy ways out”. Now Neyman always distinguishes his error statistical performance conception from Bayesian and Fiducial probabilisms [ii]. The surprising twist here is semantical and the culprit is none other than…Allan Birnbaum. Yet Birnbaum gets short shrift, and no mention is made of our favorite “breakthrough” (or did I miss it?).
Note: In this article,”attacks” on various statistical “fronts” refers to ways of attacking problems in one or another statistical research program.
HAPPY BIRTHDAY WEEK FOR NEYMAN! Continue reading
Several reviews of Deborah Mayo’s new book, Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars « Statistical Modeling, Causal Inference, and Social Science
For the first time, I’m excerpting all of Excursion 1 Tour II from SIST (2018, CUP).
1.4 The Law of Likelihood and Error Statistics
If you want to understand what’s true about statistical inference, you should begin with what has long been a holy grail–to use probability to arrive at a type of logic of evidential support–and in the first instance you should look not at full-blown Bayesian probabilism, but at comparative accounts that sidestep prior probabilities in hypotheses. An intuitively plausible logic of comparative support was given by the philosopher Ian Hacking (1965)–the Law of Likelihood. Fortunately, the Museum of Statistics is organized by theme, and the Law of Likelihood and the related Likelihood Principle is a big one. Continue reading
This was published today on the American Philosophical Association blog.
“[C]onfusion about the foundations of the subject is responsible, in my opinion, for much of the misuse of the statistics that one meets in fields of application such as medicine, psychology, sociology, economics, and so forth.” (George Barnard 1985, p. 2)
“Relevant clarifications of the nature and roles of statistical evidence in scientific research may well be achieved by bringing to bear in systematic concert the scholarly methods of statisticians, philosophers and historians of science, and substantive scientists…” (Allan Birnbaum 1972, p. 861).
“In the training program for PhD students, the relevant basic principles of philosophy of science, methodology, ethics and statistics that enable the responsible practice of science must be covered.” (p. 57, Committee Investigating fraudulent research practices of social psychologist Diederik Stapel)
I was the lone philosophical observer at a special meeting convened by the American Statistical Association (ASA) in 2015 to construct a non-technical document to guide users of statistical significance tests–one of the most common methods used to distinguish genuine effects from chance variability across a landscape of social, physical and biological sciences.
It was, by the ASA Director’s own description, “historical”, but it was also highly philosophical, and its ramifications are only now being discussed and debated. Today, introspection on statistical methods is rather common due to the “statistical crisis in science”. What is it? In a nutshell: high powered computer methods make it easy to arrive at impressive-looking ‘findings’ that too often disappear when others try to replicate them when hypotheses and data analysis protocols are required to be fixed in advance.
Deductively valid arguments can readily have false conclusions! Yes, deductively valid arguments allow drawing their conclusions with 100% reliability but only if all their premises are true. For an argument to be deductively valid means simply that if the premises of the argument are all true, then the conclusion is true. For a valid argument to entail the truth of its conclusion, all of its premises must be true. In that case the argument is said to be (deductively) sound.
Equivalently, using the definition of deductive validity that I prefer: A deductively valid argument is one where, the truth of all its premises together with the falsity of its conclusion, leads to a logical contradiction (A & ~A).
Show that an argument with the form of disjunctive syllogism can have a false conclusion. Such an argument take the form (where A, B are statements): Continue reading
An essential component of inference based on familiar frequentist notions: p-values, significance and confidence levels, is the relevant sampling distribution (hence the term sampling theory, or my preferred error statistics, as we get error probabilities from the sampling distribution). This feature results in violations of a principle known as the strong likelihood principle (SLP). To state the SLP roughly, it asserts that all the evidential import in the data (for parametric inference within a model) resides in the likelihoods. If accepted, it would render error probabilities irrelevant post data.
SLP (We often drop the “strong” and just call it the LP. The “weak” LP just boils down to sufficiency)
For any two experiments E1 and E2 with different probability models f1, f2, but with the same unknown parameter θ, if outcomes x* and y* (from E1 and E2 respectively) determine the same (i.e., proportional) likelihood function (f1(x*; θ) = cf2(y*; θ) for all θ), then x* and y* are inferentially equivalent (for an inference about θ).
(What differentiates the weak and the strong LP is that the weak refers to a single experiment.)