Neyman April 16, 1894 – August 5, 1981
I’ll continue to post Neyman-related items this week in honor of his birthday. This isn’t the only paper in which Neyman makes it clear he denies a distinction between a test of statistical hypotheses and significance tests. He and E. Pearson also discredit the myth that the former is only allowed to report pre-data, fixed error probabilities, and are justified only by dint of long-run error control. Controlling the “frequency of misdirected activities” in the midst of finding something out, or solving a problem of inquiry, on the other hand, are epistemological goals. What do you think?
“Tests of Statistical Hypotheses and Their Use in Studies of Natural Phenomena”
by Jerzy Neyman
ABSTRACT. Contrary to ideas suggested by the title of the conference at which the present paper was presented, the author is not aware of a conceptual difference between a “test of a statistical hypothesis” and a “test of significance” and uses these terms interchangeably. A study of any serious substantive problem involves a sequence of incidents at which one is forced to pause and consider what to do next. In an effort to reduce the frequency of misdirected activities one uses statistical tests. The procedure is illustrated on two examples: (i) Le Cam’s (and associates’) study of immunotherapy of cancer and (ii) a socio-economic experiment relating to low-income homeownership problems.
I recommend, especially, the example on home ownership. Here are two snippets: Continue reading
We celebrated Jerzy Neyman’s Birthday (April 16, 1894) last night in our seminar: here’s a pic of the cake. My entry today is a brief excerpt and a link to a paper of his that we haven’t discussed much on this blog: Neyman, J. (1962), ‘Two Breakthroughs in the Theory of Statistical Decision Making‘ [i] It’s chock full of ideas and arguments, but the one that interests me at the moment is Neyman’s conception of “his breakthrough”, in relation to a certain concept of “inference”. “In the present paper” he tells us, “the term ‘inferential theory’…will be used to describe the attempts to solve the Bayes’ problem with a reference to confidence, beliefs, etc., through some supplementation …either a substitute a priori distribution [exemplified by the so called principle of insufficient reason] or a new measure of uncertainty” such as Fisher’s fiducial probability. So if you hear Neyman rejecting “inferential accounts” you have to understand it in this very specific way: he’s rejecting “new measures of confidence or diffidence”. Here he alludes to them as “easy ways out”. Now Neyman always distinguishes his error statistical performance conception from Bayesian and Fiducial probabilisms [ii]. The surprising twist here is semantical and the culprit is none other than…Allan Birnbaum. Yet Birnbaum gets short shrift, and no mention is made of our favorite “breakthrough” (or did I miss it?).
drawn by his wife,Olga
Note: In this article,”attacks” on various statistical “fronts” refers to ways of attacking problems in one or another statistical research program.
HAPPY BIRTHDAY WEEK FOR NEYMAN! Continue reading
For the first time, I’m excerpting all of Excursion 1 Tour II from SIST (2018, CUP).
1.4 The Law of Likelihood and Error Statistics
If you want to understand what’s true about statistical inference, you should begin with what has long been a holy grail–to use probability to arrive at a type of logic of evidential support–and in the first instance you should look not at full-blown Bayesian probabilism, but at comparative accounts that sidestep prior probabilities in hypotheses. An intuitively plausible logic of comparative support was given by the philosopher Ian Hacking (1965)–the Law of Likelihood. Fortunately, the Museum of Statistics is organized by theme, and the Law of Likelihood and the related Likelihood Principle is a big one. Continue reading
Little bit of logic (5 little problems for you)[i]
Deductively valid arguments can readily have false conclusions! Yes, deductively valid arguments allow drawing their conclusions with 100% reliability but only if all their premises are true. For an argument to be deductively valid means simply that if the premises of the argument are all true, then the conclusion is true. For a valid argument to entail the truth of its conclusion, all of its premises must be true. In that case the argument is said to be (deductively) sound.
Equivalently, using the definition of deductive validity that I prefer: A deductively valid argument is one where, the truth of all its premises together with the falsity of its conclusion, leads to a logical contradiction (A & ~A).
Show that an argument with the form of disjunctive syllogism can have a false conclusion. Such an argument take the form (where A, B are statements): Continue reading
Tour I The Myth of “The Myth of Objectivity”*
Objectivity in statistics, as in science more generally, is a matter of both aims and methods. Objective science, in our view, aims to find out what is the case as regards aspects of the world [that hold] independently of our beliefs, biases and interests; thus objective methods aim for the critical control of inferences and hypotheses, constraining them by evidence and checks of error. (Cox and Mayo 2010, p. 276)
Whenever you come up against blanket slogans such as “no methods are objective” or “all methods are equally objective and subjective” it is a good guess that the problem is being trivialized into oblivion. Yes, there are judgments, disagreements, and values in any human activity, which alone makes it too trivial an observation to distinguish among very different ways that threats of bias and unwarranted inferences may be controlled. Is the objectivity–subjectivity distinction really toothless, as many will have you believe? I say no. I know it’s a meme promulgated by statistical high priests, but you agreed, did you not, to use a bit of chutzpah on this excursion? Besides, cavalier attitudes toward objectivity are at odds with even more widely endorsed grass roots movements to promote replication, reproducibility, and to come clean on a number of sources behind illicit results: multiple testing, cherry picking, failed assumptions, researcher latitude, publication bias and so on. The moves to take back science are rooted in the supposition that we can more objectively scrutinize results – even if it’s only to point out those that are BENT. The fact that these terms are used equivocally should not be taken as grounds to oust them but rather to engage in the difficult work of identifying what there is in “objectivity” that we won’t give up, and shouldn’t. Continue reading
First draft of PhilStat Announcement
Tour II It’s the Methods, Stupid
There is perhaps in current literature a tendency to speak of the Neyman–Pearson contributions as some static system, rather than as part of the historical process of development of thought on statistical theory which is and will always go on. (Pearson 1962, 276)
This goes for Fisherian contributions as well. Unlike museums, we won’ t remain static. The lesson from Tour I of this Excursion is that Fisherian and Neyman– Pearsonian tests may be seen as offering clusters of methods appropriate for different contexts within the large taxonomy of statistical inquiries. There is an overarching pattern: Continue reading
As you enjoy the weekend discussion & concert in the Captain’s Central Limit Library & Lounge, your Tour Guide has prepared a brief overview of Excursion 3 Tour I, and a short (semi-severe) quiz on severity, based on exhibit (i).*
We move from Popper through a gallery on “Data Analysis in the 1919 Eclipse tests of the General Theory of Relativity (GTR)” (3.1) which leads to the main gallery on the origin of statistical tests (3.2) by way of a look at where the main members of our statistical cast are in 1919: Fisher, Neyman and Pearson. From the GTR episode, we identify the key elements of a statistical test–the steps in E.S. Pearson’s opening description of tests in 3.2. The classical testing notions–type I and II errors, power, consistent tests–are shown to grow out of requiring probative tests. The typical (behavioristic) formulation of N-P tests came later. The severe tester breaks out of the behavioristic prison. A first look at the severity construal of N-P tests is in Exhibit (i). Viewing statistical inference as severe testing shows how to do all N-P tests do (and more) while a member of the Fisherian Tribe (3.3). We consider the frequentist principle of evidence FEV and the divergent interpretations that are called for by Cox’s taxonomy of null hypotheses. The last member of the taxonomy–substantively based null hypotheses–returns us to the opening episode of GTR. Continue reading
Excursion 3 Exhibit (i)
Exhibit (i) N-P Methods as Severe Tests: First Look (Water Plant Accident)
There’s been an accident at a water plant where our ship is docked, and the cooling system had to be repaired. It is meant to ensure that the mean temperature of discharged water stays below the temperature that threatens the ecosystem, perhaps not much beyond 150 degrees Fahrenheit. There were 100 water measurements taken at randomly selected times and the sample mean x computed, each with a known standard deviation σ = 10. When the cooling system is effective, each measurement is like observing X ~ N(150, 102). Because of this variability, we expect different 100-fold water samples to lead to different values of X, but we can deduce its distribution. If each X ~N(μ = 150, 102) then X is also Normal with μ = 150, but the standard deviation of X is only σ/√n = 10/√100 = 1. So X ~ N(μ = 150, 1). Continue reading
The Rothamsted School
I never worked at Rothamsted but during the eight years I was at University College London (1995-2003) I frequently shared a train journey to London from Harpenden (the village in which Rothamsted is situated) with John Nelder, as a result of which we became friends and I acquired an interest in the software package Genstat®.
That in turn got me interested in John Nelder’s approach to analysis of variance, which is a powerful formalisation of ideas present in the work of others associated with Rothamsted. Nelder’s important predecessors in this respect include, at least, RA Fisher (of course) and Frank Yates and others such as David Finney and Frank Anscombe. John died in 2010 and I regard Rosemary Bailey, who has done deep and powerful work on randomisation and the representation of experiments through Hasse diagrams, as being the greatest living proponent of the Rothamsted School. Another key figure is Roger Payne who turned many of John’s ideas into code in Genstat®. Continue reading
Below are my slides from a session on replication at the recent Philosophy of Science Association meetings in Seattle.
Tour guides in your travels jot down Mementos and Keepsakes from each Tour[i] of my new book: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (CUP 2018). Their scribblings, which may at times include details, at other times just a word or two, may be modified through the Tour, and in response to questions from travelers (so please check back). Since these are just mementos, they should not be seen as replacements for the more careful notions given in the journey (i.e., book) itself. Still, you’re apt to flesh out your notes in greater detail, so please share yours (along with errors you’re bound to spot), and we’ll create Meta-Mementos. Continue reading
a road through the jungle
In my talk yesterday at the Philosophy Department at Virginia Tech, I introduced my new book: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (Cambridge 2018). I began with my preface (explaining the meaning of my title), and turned to the Statistics Wars, largely from Excursion 1 of the book. After the sum-up at the end, I snuck in an example from the replication crisis in psychology. Here are the slides.
Today is Jerzy Neyman’s Birthday (April 16, 1894 – August 5, 1981). I am posting a brief excerpt and a link to a paper of his that I hadn’t posted before: Neyman, J. (1962), ‘Two Breakthroughs in the Theory of Statistical Decision Making‘ [i] It’s chock full of ideas and arguments, but the one that interests me at the moment is Neyman’s conception of “his breakthrough”, in relation to a certain concept of “inference”. “In the present paper” he tells us, “the term ‘inferential theory’…will be used to describe the attempts to solve the Bayes’ problem with a reference to confidence, beliefs, etc., through some supplementation …either a substitute a priori distribution [exemplified by the so called principle of insufficient reason] or a new measure of uncertainty” such as Fisher’s fiducial probability. Now Neyman always distinguishes his error statistical performance conception from Bayesian and Fiducial probabilisms [ii]. The surprising twist here is semantical and the culprit is none other than…Allan Birnbaum. Yet Birnbaum gets short shrift, and no mention is made of our favorite “breakthrough” (or did I miss it?). [iii] I’ll explain in later stages of this post & in comments…(so please check back); I don’t want to miss the start of the birthday party in honor of Neyman, and it’s already 8:30 p.m in Berkeley!
Note: In this article,”attacks” on various statistical “fronts” refers to ways of attacking problems in one or another statistical research program. HAPPY BIRTHDAY NEYMAN! Continue reading
Head of Competence Center
for Methodology and Statistics (CCMS)
Luxembourg Institute of Health
Being a statistician means never having to say you are certain
A recent discussion of randomised controlled trials by Angus Deaton and Nancy Cartwright (D&C) contains much interesting analysis but also, in my opinion, does not escape rehashing some of the invalid criticisms of randomisation with which the literatures seems to be littered. The paper has two major sections. The latter, which deals with generalisation of results, or what is sometime called external validity, I like much more than the former which deals with internal validity. It is the former I propose to discuss.