In Recognition of Fisher’s birthday (Feb 17), I reblog his contribution to the “Triad”–an exchange between Fisher, Neyman and Pearson 20 years after the Fisher-Neyman break-up. The other two are below. My favorite is the reply by E.S. Pearson, but all are chock full of gems for different reasons. They are each very short and are worth your rereading. Continue reading

# Neyman

## R.A. Fisher: “Statistical methods and Scientific Induction” with replies by Neyman and E.S. Pearson

## If you like Neyman’s confidence intervals then you like N-P tests

Neyman, confronted with unfortunate news would always say “too bad!” At the end of Jerzy Neyman’s birthday week, I cannot help imagining him saying “too bad!” as regards some twists and turns in the statistics wars. First, too bad Neyman-Pearson (N-P) tests aren’t in the ASA Statement (2016) on P-values: “To keep the statement reasonably simple, we did not address alternative hypotheses, error types, or power”. An especially aggrieved “too bad!” would be earned by the fact that those in love with confidence interval estimators don’t appreciate that Neyman developed them (in 1930) as a method with a precise interrelationship with N-P tests. So * if you love CI estimators, then you love N-P tests!* Continue reading

## Neyman: Distinguishing tests of statistical hypotheses and tests of significance might have been a lapse of someone’s pen

*I’ll continue to post Neyman-related items this week in honor of his birthday. This isn’t the only paper in which Neyman makes it clear he denies a distinction between a test of statistical hypotheses and significance tests. He and E. Pearson also discredit the myth that the former is only allowed to report pre-data, fixed error probabilities, and are justified only by dint of long-run error control. Controlling the “frequency of misdirected activities” in the midst of finding something out, or solving a problem of inquiry, on the other hand, are epistemological goals. What do you think?*

“Tests of Statistical Hypotheses and Their Use in Studies of Natural Phenomena”

by Jerzy Neyman

ABSTRACT. Contrary to ideas suggested by the title of the conference at which the present paper was presented, the author is not aware of a conceptual difference between a “test of a statistical hypothesis” and a “test of significance” and uses these terms interchangeably. A study of any serious substantive problem involves a sequence of incidents at which one is forced to pause and consider what to do next. In an effort to reduce the frequency of misdirected activities one uses statistical tests. The procedure is illustrated on two examples: (i) Le Cam’s (and associates’) study of immunotherapy of cancer and (ii) a socio-economic experiment relating to low-income homeownership problems.

I recommend, especially, the example on home ownership. Here are two snippets: Continue reading

## Neyman vs the ‘Inferential’ Probabilists

**We celebrated Jerzy Neyman’s Birthday (April 16, 1894) **last night in our seminar: here’s a pic of the cake. My entry today is a brief excerpt and a link to a paper of his that we haven’t discussed much on this blog: Neyman, J. (1962), ‘Two Breakthroughs in the Theory of Statistical Decision Making‘ [i] It’s chock full of ideas and arguments, but the one that interests me at the moment is Neyman’s conception of “his breakthrough”, in relation to a certain concept of “inference”. “In the present paper” he tells us, “the term ‘inferential theory’…will be used to describe the attempts to solve the Bayes’ problem with a reference to confidence, beliefs, etc., through some supplementation …either a substitute *a priori* distribution [exemplified by the so called principle of insufficient reason] or a new measure of uncertainty” such as Fisher’s fiducial probability. So if you hear Neyman rejecting “inferential accounts” you have to understand it in this very specific way: he’s rejecting “new measures of confidence or diffidence”. Here he alludes to them as “easy ways out”. Now Neyman always distinguishes his error statistical performance conception from Bayesian and Fiducial probabilisms [ii]. The surprising twist here is semantical and the culprit is none other than…Allan Birnbaum. Yet Birnbaum gets short shrift, and no mention is made of our favorite “breakthrough” (or did I miss it?).

Note: In this article,”attacks” on various statistical “fronts” refers to ways of attacking problems in one or another statistical research program.

**HAPPY BIRTHDAY WEEK FOR NEYMAN!** Continue reading

## Jerzy Neyman and “Les Miserables Citations” (statistical theater in honor of his birthday yesterday)

**My second Jerzy Neyman item, in honor of his birthday, is a little play that I wrote for**** Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (****2018):**

**A local acting group is putting on a short theater production based on a screenplay I wrote: “Les Miserables Citations” (“Those Miserable Quotes”) [1]. The “miserable” citations are those everyone loves to cite, from their early joint 1933 paper:**

We are inclined to think that as far as a particular hypothesis is concerned, no test based upon the theory of probability can by itself provide any valuable evidence of the truth or falsehood of that hypothesis.

But we may look at the purpose of tests from another viewpoint. Without hoping to know whether each separate hypothesis is true or false, we may search for rules to govern our behavior with regard to them, in following which we insure that, in the long run of experience, we shall not be too often wrong. (Neyman and Pearson 1933, pp. 290-1).

## A. Spanos: Jerzy Neyman and his Enduring Legacy

**Today is Jerzy Neyman’s birthday. I’ll post various Neyman items this week in recognition of it, starting with a guest post by Aris Spanos.** **Happy Birthday Neyman!**

*A Statistical Model as a Chance Mechanism
*

**Aris Spanos**

**Jerzy Neyman** **(April 16, 1894 – August 5, 1981)**, was a Polish/American statistician[i] who spent most of his professional career at the University of California, Berkeley. Neyman is best known in statistics for his pioneering contributions in framing the Neyman-Pearson (N-P) optimal theory of hypothesis testing and his theory of Confidence Intervals. (This article was first posted here.)

One of Neyman’s most remarkable, but least recognized, achievements was his adapting of Fisher’s (1922) notion of a statistical model to render it pertinent for non-random samples. Fisher’s original parametric statistical model M_{θ}(**x**) was based on the idea of ‘a hypothetical infinite population’, chosen so as to ensure that the observed data **x**_{0}:=(x_{1},x_{2},…,x_{n}) can be viewed as a ‘truly representative sample’ from that ‘population’: Continue reading

## Deconstructing the Fisher-Neyman conflict wearing fiducial glasses + Excerpt 5.8 from SIST

This continues my previous post: “Can’t take the fiducial out of Fisher…” in recognition of Fisher’s birthday, February 17. These 2 posts reflect my working out of these ideas in writing Section 5.8 of *Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars* (SIST, CUP 2018).* *Here’s all of Section 5.8 (“Neyman’s Performance and Fisher’s Fiducial Probability”) for your Saturday night reading.*

Move up 20 years to the famous 1955/56 exchange between Fisher and Neyman. Fisher clearly connects Neyman’s adoption of a behavioristic-performance formulation to his denying the soundness of fiducial inference. When “Neyman denies the existence of inductive reasoning, he is merely expressing a verbal preference. For him ‘reasoning’ means what ‘deductive reasoning’ means to others.” (Fisher 1955, p. 74). Continue reading

## R.A. Fisher: “Statistical methods and Scientific Induction”

In Recognition of Fisher’s birthday (Feb 17), I reblog his contribution to the “Triad”–an exchange between Fisher, Neyman and Pearson 20 years after the Fisher-Neyman break-up. The other two are below. They are each very short and are worth your rereading.

*“Statistical Methods and Scientific Induction“*

*by Sir Ronald Fisher (1955)
*

**SUMMARY**

The attempt to reinterpret the common tests of significance used in scientific research as though they constituted some kind of acceptance procedure and led to “decisions” in Wald’s sense, originated in several misapprehensions and has led, apparently, to several more.

The three phrases examined here, with a view to elucidating they fallacies they embody, are:

- “Repeated sampling from the same population”,
- Errors of the “second kind”,
- “Inductive behavior”.

Mathematicians without personal contact with the Natural Sciences have often been misled by such phrases. The errors to which they lead are not only numerical.

To continue reading Fisher’s paper.

**“Note on an Article by Sir Ronald Fisher“**

**by Jerzy Neyman (1956)**

**Summary**

(1) FISHER’S allegation that, contrary to some passages in the introduction and on the cover of the book by Wald, this book does not really deal with experimental design is unfounded. In actual fact, the book is permeated with problems of experimentation. (2) Without consideration of hypotheses alternative to the one under test and without the study of probabilities of the two kinds, no purely probabilistic theory of tests is possible. Continue reading

## Neyman-Pearson Tests: An Episode in Anglo-Polish Collaboration: Excerpt from Excursion 3 (3.2)

**3.2 N-P Tests: An Episode in Anglo-Polish Collaboration***

We proceed by setting up a specific hypothesis to test,

H_{0 }in Neyman’s and my terminology, the null hypothesis in R. A. Fisher’s . . . in choosing the test, we take into account alternatives toH_{0 }which we believe possible or at any rate consider it most important to be on the look out for . . .Three steps in constructing the test may be defined:

Step 1. We must first specify the set of results . . .

Step 2.We then divide this set by a system of ordered boundaries . . .such that as we pass across one boundary and proceed to the next, we come to a class of results which makes us more and more inclined, on the information available, to reject the hypothesis tested in favour of alternatives which differ from it by increasing amounts.

Step 3. We then, if possible, associate with each contour level the chance that, ifH_{0}is true, a result will occur in random sampling lying beyond that level . . .In our first papers [in 1928] we suggested that the likelihood ratio criterion, λ, was a very useful one . . . Thus Step 2 proceeded Step 3. In later papers [1933–1938] we started with a fixed value for the chance, ε, of Step 3 . . . However, although the mathematical procedure may put Step 3 before 2, we cannot put this into operation before we have decided, under Step 2, on the guiding principle to be used in choosing the contour system. That is why I have numbered the steps in this order. (Egon Pearson 1947, p. 173)

In addition to Pearson’s 1947 paper, the museum follows his account in “The Neyman–Pearson Story: 1926–34” (Pearson 1970). The subtitle is “Historical Sidelights on an Episode in Anglo-Polish Collaboration”!

We meet Jerzy Neyman at the point he’s sent to have his work sized up by Karl Pearson at University College in 1925/26. Neyman wasn’t that impressed: Continue reading

## Neyman vs the ‘Inferential’ Probabilists continued (a)

**Today is Jerzy Neyman’s Birthday (April 16, 1894 – August 5, 1981). ** I am posting a brief excerpt and a link to a paper of his that I hadn’t posted before: Neyman, J. (1962), ‘Two Breakthroughs in the Theory of Statistical Decision Making‘ [i] It’s chock full of ideas and arguments, but the one that interests me at the moment is Neyman’s conception of “his breakthrough”, in relation to a certain concept of “inference”. “In the present paper” he tells us, “the term ‘inferential theory’…will be used to describe the attempts to solve the Bayes’ problem with a reference to confidence, beliefs, etc., through some supplementation …either a substitute *a priori* distribution [exemplified by the so called principle of insufficient reason] or a new measure of uncertainty” such as Fisher’s fiducial probability. Now Neyman always distinguishes his error statistical performance conception from Bayesian and Fiducial probabilisms [ii]. The surprising twist here is semantical and the culprit is none other than…Allan Birnbaum. Yet Birnbaum gets short shrift, and no mention is made of our favorite “breakthrough” (or did I miss it?). [iii] I’ll explain in later stages of this post & in comments…(so please check back); I don’t want to miss the start of the birthday party in honor of Neyman, and it’s already 8:30 p.m in Berkeley!

Note: In this article,”attacks” on various statistical “fronts” refers to ways of attacking problems in one or another statistical research program. **HAPPY BIRTHDAY NEYMAN!** Continue reading

## Deconstructing the Fisher-Neyman conflict wearing fiducial glasses (continued)

**[An updated version with corrected links can be found here.]**

This continues my previous post: “Can’t take the fiducial out of Fisher…” in recognition of Fisher’s birthday, February 17. I supply a few more intriguing articles you may find enlightening to read and/or reread on a Saturday night

Move up 20 years to the famous 1955/56 exchange between Fisher and Neyman. Fisher clearly connects Neyman’s adoption of a behavioristic-performance formulation to his denying the soundness of fiducial inference. When “Neyman denies the existence of inductive reasoning, he is merely expressing a verbal preference. For him ‘reasoning’ means what ‘deductive reasoning’ means to others.” (Fisher 1955, p. 74). Continue reading

## R.A. Fisher: “Statistical methods and Scientific Induction”

I continue a week of Fisherian posts begun on his birthday (Feb 17). This is his contribution to the “Triad”–an exchange between Fisher, Neyman and Pearson 20 years after the Fisher-Neyman break-up. The other two are below. They are each very short and are worth your rereading.

*“Statistical Methods and Scientific Induction”*

*by Sir Ronald Fisher (1955)
*

**SUMMARY**

The attempt to reinterpret the common tests of significance used in scientific research as though they constituted some kind of acceptance procedure and led to “decisions” in Wald’s sense, originated in several misapprehensions and has led, apparently, to several more.

The three phrases examined here, with a view to elucidating they fallacies they embody, are:

- “Repeated sampling from the same population”,
- Errors of the “second kind”,
- “Inductive behavior”.

Mathematicians without personal contact with the Natural Sciences have often been misled by such phrases. The errors to which they lead are not only numerical.

To continue reading Fisher’s paper.

**“Note on an Article by Sir Ronald Fisher“**

**by Jerzy Neyman (1956)**

**Summary**

(1) FISHER’S allegation that, contrary to some passages in the introduction and on the cover of the book by Wald, this book does not really deal with experimental design is unfounded. In actual fact, the book is permeated with problems of experimentation. (2) Without consideration of hypotheses alternative to the one under test and without the study of probabilities of the two kinds, no purely probabilistic theory of tests is possible. (3) The conceptual fallacy of the notion of fiducial distribution rests upon the lack of recognition that valid probability statements about random variables usually cease to be valid if the random variables are replaced by their particular values. The notorious multitude of “paradoxes” of fiducial theory is a consequence of this oversight. (4) The idea of a “cost function for faulty judgments” appears to be due to Laplace, followed by Gauss.

“**Statistical Concepts in Their Relation to Reality“.**

**by E.S. Pearson (1955)**

Controversies in the field of mathematical statistics seem largely to have arisen because statisticians have been unable to agree upon how theory is to provide, in terms of probability statements, the numerical measures most helpful to those who have to draw conclusions from observational data. We are concerned here with the ways in which mathematical theory may be put, as it were, into gear with the common processes of rational thought, and there seems no reason to suppose that there is one best way in which this can be done. If, therefore, Sir Ronald Fisher recapitulates and enlarges on his views upon statistical methods and scientific induction we can all only be grateful, but when he takes this opportunity to criticize the work of others through misapprehension of their views as he has done in his recent contribution to this *Journal* (Fisher 1955 “Scientific Methods and Scientific Induction” ), it is impossible to leave him altogether unanswered.

In the first place it seems unfortunate that much of Fisher’s criticism of Neyman and Pearson’s approach to the testing of statistical hypotheses should be built upon a “penetrating observation” ascribed to Professor G.A. Barnard, the assumption involved in which happens to be historically incorrect. There was no question of a difference in point of view having “originated” when Neyman “reinterpreted” Fisher’s early work on tests of significance “in terms of that technological and commercial apparatus which is known as an acceptance procedure”. There was no sudden descent upon British soil of Russian ideas regarding the function of science in relation to technology and to five-year plans. It was really much simpler–or worse. *The original heresy, as we shall see, was a Pearson one!…*

To continue reading, “Statistical Concepts in Their Relation to Reality” click HERE

## Jerzy Neyman and “Les Miserables Citations” (statistical theater in honor of his birthday)

**For my final Jerzy Neyman item, here’s the post I wrote for his birthday last year: **

**A local acting group is putting on a short theater production based on a screenplay I wrote: “Les Miserables Citations” (“Those Miserable Quotes”) [1]. The “miserable” citations are those everyone loves to cite, from their early joint 1933 paper:**

We are inclined to think that as far as a particular hypothesis is concerned, no test based upon the theory of probability can by itself provide any valuable evidence of the truth or falsehood of that hypothesis.

But we may look at the purpose of tests from another viewpoint. Without hoping to know whether each separate hypothesis is true or false, we may search for rules to govern our behavior with regard to them, in following which we insure that, in the long run of experience, we shall not be too often wrong. (Neyman and Pearson 1933, pp. 290-1).

In this early paper, Neyman and Pearson were still groping toward the basic concepts of tests–for example, “power” had yet to be coined. Taken out of context, these quotes have led to knee-jerk (behavioristic) interpretations which neither Neyman nor Pearson would have accepted. What was the real context of those passages? Well, the paper opens, just five paragraphs earlier, with a discussion of a debate between two French probabilists—Joseph Bertrand, author of “Calculus of Probabilities” (1907), and Emile Borel, author of “Le Hasard” (1914)! According to Neyman, what served* “as an inspiration to Egon S. Pearson and myself in our effort to build a frequentist theory of testing hypotheses”(1977, p. 103) *initially grew out of remarks of Borel, whose lectures Neyman had attended in Paris. He returns to the Bertrand-Borel debate in four different papers, and circles back to it often in his talks with his biographer, Constance Reid. His student Erich Lehmann (1993), regarded as the authority on Neyman, wrote an entire paper on the topic: “The Bertrand-Borel Debate and the Origins of the Neyman Pearson Theory”. Continue reading

## Neyman: Distinguishing tests of statistical hypotheses and tests of significance might have been a lapse of someone’s pen

*I’ll continue to post Neyman-related items this week in honor of his birthday. This isn’t the only paper in which Neyman makes it clear he denies a distinction between a test of statistical hypotheses and significance tests. He and E. Pearson also discredit the myth that the former is only allowed to report pre-data, fixed error probabilities, and are justified only by dint of long-run error control. Controlling the “frequency of misdirected activities” in the midst of finding something out, or solving a problem of inquiry, on the other hand, are epistemological goals. What do you think?*

“Tests of Statistical Hypotheses and Their Use in Studies of Natural Phenomena”

by Jerzy Neyman

ABSTRACT. Contrary to ideas suggested by the title of the conference at which the present paper was presented, the author is not aware of a conceptual difference between a “test of a statistical hypothesis” and a “test of significance” and uses these terms interchangeably. A study of any serious substantive problem involves a sequence of incidents at which one is forced to pause and consider what to do next. In an effort to reduce the frequency of misdirected activities one uses statistical tests. The procedure is illustrated on two examples: (i) Le Cam’s (and associates’) study of immunotherapy of cancer and (ii) a socio-economic experiment relating to low-income homeownership problems.

I recommend, especially, the example on home ownership. Here are two snippets: Continue reading

## A. Spanos: Jerzy Neyman and his Enduring Legacy

**Today is Jerzy Neyman’s birthday. I’ll post various Neyman items this week in honor of it, starting with a guest post by Aris Spanos.** **Happy Birthday Neyman!**

*A Statistical Model as a Chance Mechanism
*

**Aris Spanos**

**Jerzy Neyman** **(April 16, 1894 – August 5, 1981)**, was a Polish/American statistician[i] who spent most of his professional career at the University of California, Berkeley. Neyman is best known in statistics for his pioneering contributions in framing the Neyman-Pearson (N-P) optimal theory of hypothesis testing and his theory of Confidence Intervals. (This article was first posted here.)

One of Neyman’s most remarkable, but least recognized, achievements was his adapting of Fisher’s (1922) notion of a statistical model to render it pertinent for non-random samples. Fisher’s original parametric statistical model M_{θ}(**x**) was based on the idea of ‘a hypothetical infinite population’, chosen so as to ensure that the observed data **x**_{0}:=(x_{1},x_{2},…,x_{n}) can be viewed as a ‘truly representative sample’ from that ‘population’:

“The postulate of randomness thus resolves itself into the question, Of what population is this a random sample? (ibid., p. 313), underscoring that: the adequacy of our choice may be tested a posteriori.’’ (p. 314) Continue reading

## R.A. Fisher: “Statistical methods and Scientific Induction”

I continue a week of Fisherian posts in honor of his birthday (Feb 17). This is his contribution to the “Triad”–an exchange between Fisher, Neyman and Pearson 20 years after the Fisher-Neyman break-up. They are each very short.

*“Statistical Methods and Scientific Induction”*

*by Sir Ronald Fisher (1955)
*

**SUMMARY**

The attempt to reinterpret the common tests of significance used in scientific research as though they constituted some kind of acceptance procedure and led to “decisions” in Wald’s sense, originated in several misapprehensions and has led, apparently, to several more.

The three phrases examined here, with a view to elucidating they fallacies they embody, are:

- “Repeated sampling from the same population”,
- Errors of the “second kind”,
- “Inductive behavior”.

Mathematicians without personal contact with the Natural Sciences have often been misled by such phrases. The errors to which they lead are not only numerical.

To continue reading Fisher’s paper.

The most noteworthy feature is Fisher’s position on Fiducial inference, typically downplayed. I’m placing a summary and link to Neyman’s response below–it’s that interesting. Continue reading

## Jerzy Neyman and “Les Miserables Citations” (statistical theater in honor of his birthday)

**In honor of Jerzy Neyman’s birthday today, a local acting group is putting on a short theater production based on a screenplay I wrote: “Les Miserables Citations” (“Those Miserable Quotes”) [1]. The “miserable” citations are those everyone loves to cite, from their early joint 1933 paper:**

We are inclined to think that as far as a particular hypothesis is concerned, no test based upon the theory of probability can by itself provide any valuable evidence of the truth or falsehood of that hypothesis.

But we may look at the purpose of tests from another viewpoint. Without hoping to know whether each separate hypothesis is true or false, we may search for rules to govern our behavior with regard to them, in following which we insure that, in the long run of experience, we shall not be too often wrong. (Neyman and Pearson 1933, pp. 290-1).

## Deconstructing the Fisher-Neyman conflict wearing fiducial glasses (continued)

**[An updated version with corrected links can be found here.]**

This continues my previous post: “Can’t take the fiducial out of Fisher…” in recognition of Fisher’s birthday, February 17. I supply a few more intriguing articles you may find enlightening to read and/or reread on a Saturday night

Move up 20 years to the famous 1955/56 exchange between Fisher and Neyman. Fisher clearly connects Neyman’s adoption of a behavioristic-performance formulation to his denying the soundness of fiducial inference. When “Neyman denies the existence of inductive reasoning, he is merely expressing a verbal preference. For him ‘reasoning’ means what ‘deductive reasoning’ means to others.” (Fisher 1955, p. 74).

Fisher was right that Neyman’s calling the outputs of statistical inferences “actions” merely expressed Neyman’s preferred way of talking. Nothing earth-shaking turns on the choice to dub every inference “an act of making an inference”.[i] The “rationality” or “merit” goes into the rule. Neyman, much like Popper, had a good reason for drawing a bright red line between his use of probability (for corroboration or probativeness) and its use by ‘probabilists’ (who assign probability to hypotheses). Fisher’s Fiducial probability was in danger of blurring this very distinction. Popper said, and Neyman would have agreed, that he had no problem with our using the word induction so long it was kept clear it meant testing hypotheses severely. Continue reading

## Erich Lehmann: Neyman-Pearson & Fisher on P-values

**Today is Erich Lehmann’s birthday (20 November 1917 – 12 September 2009).** Lehmann was Neyman’s first student at Berkeley (Ph.D 1942), and his framing of Neyman-Pearson (NP) methods has had an enormous influence on the way we typically view them.

I got to know Erich in 1997, shortly after publication of EGEK (1996). One day, I received a bulging, six-page, handwritten letter from him in tiny, extremely neat scrawl (and many more after that). He began by telling me that he was sitting in a very large room at an ASA (American Statistical Association) meeting where they were shutting down the conference book display (or maybe they were setting it up), and on a very long, wood table sat just one book, all alone, shiny red. He said he wondered if it might be of interest to him! So he walked up to it…. It turned out to be my *Error and the Growth of Experimental Knowledge* (1996, Chicago), which he reviewed soon after[0]. (What are the chances?) Some related posts on Lehmann’s letter are here and here.

One of Lehmann’s more philosophical papers is Lehmann (1993), “The Fisher, Neyman-Pearson Theories of Testing Hypotheses: One Theory or Two?” We haven’t discussed it before on this blog. Here are some excerpts (blue), and remarks (black)

…A distinction frequently made between the approaches of Fisher and Neyman-Pearson is that in the latter the test is carried out at a fixed level, whereas the principal outcome of the former is the statement of a p value that may or may not be followed by a pronouncement concerning significance of the result [p.1243].

The history of this distinction is curious. Throughout the 19th century, testing was carried out rather informally. It was roughly equivalent to calculating an (approximate) p value and rejecting the hypothesis if this value appeared to be sufficiently small. … Fisher, in his 1925 book and later, greatly reduced the needed tabulations by providing tables not of the distributions themselves but of selected quantiles. … These tables allow the calculation only of ranges for the p values; however, they are exactly suited for determining the critical values at which the statistic under consideration becomes significant at a given level. As Fisher wrote in explaining the use of his [chi square] table (1946, p. 80):

In preparing this table we have borne in mind that in practice we do not want to know the exact value of P for any observed [chi square], but, in the first place, whether or not the observed value is open to suspicion. If P is between .1 and .9, there is certainly no reason to suspect the hypothesis tested. If it is below .02, it is strongly indicated that the hypothesis fails to account for the whole of the facts. We shall not often be astray if we draw a conventional line at .05 and consider that higher values of [chi square] indicate a real discrepancy.

Similarly, he also wrote (1935, p. 13) that “it is usual and convenient for experimenters to take 5 percent as a standard level of significance, in the sense that they are prepared to ignore all results which fail to reach this standard .. .” …. Continue reading

## Neyman: Distinguishing tests of statistical hypotheses and tests of significance might have been a lapse of someone’s pen

“Tests of Statistical Hypotheses and Their Use in Studies of Natural Phenomena” by Jerzy Neyman

ABSTRACT. Contrary to ideas suggested by the title of the conference at which the present paper was presented, the author is not aware of a conceptual difference between a “test of a statistical hypothesis” and a “test of significance” and uses these terms interchangeably. A study of any serious substantive problem involves a sequence of incidents at which one is forced to pause and consider what to do next. In an effort to reduce the frequency of misdirected activities one uses statistical tests. The procedure is illustrated on two examples: (i) Le Cam’s (and associates’) study of immunotherapy of cancer and (ii) a socio-economic experiment relating to low-income homeownership problems.

Neyman died on August 5, 1981. Here’s an unusual paper of his, “Tests of Statistical Hypotheses and Their Use in Studies of Natural Phenomena.” I have been reading a fair amount by Neyman this summer in writing about the origins of his philosophy, and have found further corroboration of the position that the behavioristic view attributed to him, while not entirely without substance*, is largely a fable that has been steadily built up and accepted as gospel. This has justified ignoring Neyman-Pearson statistics (as resting solely on long-run performance and irrelevant to scientific inference) and turning to crude variations of significance tests, that Fisher wouldn’t have countenanced for a moment (so-called NHSTs), lacking alternatives, incapable of learning from negative results, and permitting all sorts of P-value abuses–notably going from a small p-value to claiming evidence for a substantive research hypothesis. The upshot is to reject all of frequentist statistics, even though P-values are a teeny tiny part. *This represents a change in my perception of Neyman’s philosophy since EGEK (Mayo 1996). I still say that that for our uses of method, it doesn’t matter what anybody thought, that “it’s the methods, stupid!” Anyway, I recommend, in this very short paper, the general comments and the example on home ownership. Here are two snippets: Continue reading