phil/history of stat

E.S. Pearson: “Ideas came into my head as I sat on a gate overlooking an experimental blackcurrant plot”

E.S.Pearson on Gate

E.S.Pearson on a Gate,             Mayo sketch

Today is Egon Pearson’s birthday (11 Aug., 1895-12 June, 1980); and here you see my scruffy sketch of him, at the start of my book, “Error and the Growth of Experimental Knowledge” (EGEK 1996). As Erich Lehmann put it in his EGEK review, Pearson is “the hero of Mayo’s story” because I found in his work, if only in brief discussions, hints, and examples, the key elements for an “inferential” or “evidential” interpretation of Neyman-Pearson theory of statistics.  “Pearson and Pearson” statistics (both Egon, not Karl) would have looked very different from Neyman and Pearson statistics, I suspect. One of the few sources of E.S. Pearson’s statistical philosophy is his (1955) “Statistical Concepts in Their Relation to Reality”. It begins like this:

Controversies in the field of mathematical statistics seem largely to have arisen because statisticians have been unable to agree upon how theory is to provide, in terms of probability statements, the numerical measures most helpful to those who have to draw conclusions from observational data.  We are concerned here with the ways in which mathematical theory may be put, as it were, into gear with the common processes of rational thought, and there seems no reason to suppose that there is one best way in which this can be done.  If, therefore, Sir Ronald Fisher recapitulates and enlarges on his views upon statistical methods and scientific induction we can all only be grateful, but when he takes this opportunity to criticize the work of others through misapprehension of their views as he has done in his recent contribution to this Journal (Fisher 1955 “Scientific Methods and Scientific Induction” ), it is impossible to leave him altogether unanswered.

In the first place it seems unfortunate that much of Fisher’s criticism of Neyman and Pearson’s approach to the testing of statistical hypotheses should be built upon a “penetrating observation” ascribed to Professor G.A. Barnard, the assumption involved in which happens to be historically incorrect.  There was no question of a difference in point of view having “originated” when Neyman “reinterpreted” Fisher’s early work on tests of significance “in terms of that technological and commercial apparatus which is known as an acceptance procedure”.  There was no sudden descent upon British soil of Russian ideas regarding the function of science in relation to technology and to five-year plans.  It was really much simpler–or worse.  The original heresy, as we shall see, was a Pearson one!…
Indeed, to dispel the picture of the Russian technological bogey, I might recall how certain early ideas came into my head as I sat on a gate overlooking an experimental blackcurrant plot…!

To continue reading, “Statistical Concepts in Their Relation to Reality” click HERE.

See also Aris Spanos: “Egon Pearson’s Neglected Contributions to Statistics“.

Happy Birthday E.S. Pearson!

Categories: phil/history of stat, Philosophy of Statistics, Statistics | Tags: , | 4 Comments

PhilStatLaw: Reference Manual on Scientific Evidence (3d ed) on Statistical Significance (Schachtman)

Memory Lane: One Year Ago on error statistics.com

A quick perusal of the “Manual” on Nathan Schachtman’s legal blog shows it to be chock full of revealing points of contemporary legal statistical philosophy.  The following are some excerpts, read the full blog here.   I make two comments at the end.

July 8th, 2012

Nathan Schachtman

How does the new Reference Manual on Scientific Evidence (RMSE3d 2011) treat statistical significance?  Inconsistently and at times incoherently.

Professor Berger’s Introduction

In her introductory chapter, the late Professor Margaret A. Berger raises the question of the role statistical significance should play in evaluating a study’s support for causal conclusions:

“What role should statistical significance play in assessing the value of a study? Epidemiological studies that are not conclusive but show some increased risk do not prove a lack of causation. Some courts find that they therefore have some probative value, 62 at least in proving general causation. 63”

Margaret A. Berger, “The Admissibility of Expert Testimony,” in RMSE3d 11, 24 (2011).

This seems rather backwards.  Berger’s suggestion that inconclusive studies do not prove lack of causation seems nothing more than a tautology.  And how can that tautology support the claim that inconclusive studies “therefore ” have some probative value? This is a fairly obvious logical invalid argument, or perhaps a passage badly in need of an editor.

…………

Chapter on Statistics

The RMSE’s chapter on statistics is relatively free of value judgments about significance probability, and, therefore, a great improvement upon Berger’s introduction.  The authors carefully describe significance probability and p-values, and explain:

“Small p-values argue against the null hypothesis. Statistical significance is determined by reference to the p-value; significance testing (also called hypothesis testing) is the technique for computing p-values and determining statistical significance.”

David H. Kaye and David A. Freedman, “Reference Guide on Statistics,” in RMSE3d 211, 241 (3ed 2011).  Although the chapter confuses and conflates Fisher’s interpretation of p-values with Neyman’s conceptualization of hypothesis testing as a dichotomous decision procedure, this treatment is unfortunately fairly standard in introductory textbooks.

Kaye and Freedman, however, do offer some important qualifications to the untoward consequences of using significance testing as a dichotomous outcome:

“Artifacts from multiple testing are commonplace. Because research that fails to uncover significance often is not published, reviews of the literature may produce an unduly large number of studies finding statistical significance.111 Even a single researcher may examine so many different relationships that a few will achieve statistical significance by mere happenstance. Almost any large data set—even pages from a table of random digits—will contain some unusual pattern that can be uncovered by diligent search. Having detected the pattern, the analyst can perform a statistical test for it, blandly ignoring the search effort. Statistical significance is bound to follow.

There are statistical methods for dealing with multiple looks at the data, which permit the calculation of meaningful p-values in certain cases.112 However, no general solution is available, and the existing methods would be of little help in the typical case where analysts have tested and rejected a variety of models before arriving at the one considered the most satisfactory (see infra Section V on regression models). In these situations, courts should not be overly impressed with claims that estimates are significant. Instead, they should be asking how analysts developed their models.113 ”

Id. at 256 -57.  This qualification is omitted from the overlapping discussion in the chapter on epidemiology, where it is very much needed. Continue reading

Categories: P-values, PhilStatLaw, significance tests | Tags: , , , , | 6 Comments

A.Birnbaum: Statistical Methods in Scientific Inference

Birnbaum: born May 27, 1923

Today is (statistician) Allan Birnbaum’s birthday. He lived to be only 53 [i]. From the perspective of philosophy of statistics and philosophy of science, Birnbaum is best known for his work on likelihood, the Likelihood Principle [ii], and for his attempts to blend concepts of likelihood with error probability ideas to obtain what he called “concepts of statistical evidence”. Failing to find adequate concepts of statistical evidence, Birnbaum called for joining the work of “interested statisticians, scientific workers and philosophers and historians of science”–an idea I would heartily endorse!  While known for attempts to argue that the (strong) Likelihood Principle followed from sufficiency and conditionality principles, a few years after publishing this result, he seems to have turned away from it, perhaps discovering gaps in his argument.

NATURE VOL. 225 MARCH 14, 1970 (1033)

LETTERS TO THE EDITOR

Statistical methods in Scientific Inference

 It is regrettable that Edwards’s interesting article[1], supporting the likelihood and prior likelihood concepts, did not point out the specific criticisms of likelihood (and Bayesian) concepts that seem to dissuade most theoretical and applied statisticians from adopting them. As one whom Edwards particularly credits with having ‘analysed in depth…some attractive properties” of the likelihood concept, I must point out that I am not now among the ‘modern exponents” of the likelihood concept. Further, after suggesting that the notion of prior likelihood was plausible as an extension or analogue of the usual likelihood concept (ref.2, p. 200)[2], I have pursued the matter through further consideration and rejection of both the likelihood concept and various proposed formalizations of prior information and opinion (including prior likelihood).  I regret not having expressed my developing views in any formal publication between 1962 and late 1969 (just after ref. 1 appeared). My present views have now, however, been published in an expository but critical article (ref. 3, see also ref. 4)[3] [4], and so my comments here will be restricted to several specific points that Edwards raised.

 If there has been ‘one rock in a shifting scene’ or general statistical thinking and practice in recent decades, it has not been the likelihood concept, as Edwards suggests, but rather the concept by which confidence limits and hypothesis tests are usually interpreted, which we may call the confidence concept of statistical evidence. This concept is not part of the Neyman-Pearson theory of tests and confidence region estimation, which denies any role to concepts of statistical evidence, as Neyman consistently insists. The confidence concept takes from the Neyman-Pearson approach techniques for systematically appraising and bounding the probabilities (under respective hypotheses) of seriously misleading interpretations of data. (The absence of a comparable property in the likelihood and Bayesian approaches is widely regarded as a decisive inadequacy.) The confidence concept also incorporates important but limited aspects of the likelihood concept: the sufficiency concept, expressed in the general refusal to use randomized tests and confidence limits when they are recommended by the Neyman-Pearson approach; and some applications of the conditionality concept. It is remarkable that this concept, an incompletely formalized synthesis of ingredients borrowed from mutually incompatible theoretical approaches, is evidently useful continuously in much critically informed statistical thinking and practice [emphasis mine].

While inferences of many sorts are evident everywhere in scientific work, the existence of precise, general and accurate schemas of scientific inference remains a problem. Mendelian examples like those of Edwards and my 1969 paper seem particularly appropriate as case-study material for clarifying issues and facilitating effective communication among interested statisticians, scientific workers and philosophers and historians of science.

Allan Birnbaum
New York University
Courant Institute of Mathematical Sciences,
251 Mercer Street,
New York, NY 10012

Birnbaum’s confidence concept, sometimes written (Conf), was his attempt to find in error statistical ideas a concept of statistical evidence–a term that he invented and popularized. In Birnbaum 1977 (24), he states it as follows:

(Conf): A concept of statistical evidence is not plausible unless it finds ‘strong evidence for J as against H with small probability (α) when H is true, and with much larger probability (1 – β) when J is true.

Birnbaum questioned whether Neyman-Pearson methods had “concepts of evidence”  simply because Neyman talked of “inductive behavior” and Wald and others cauched statistical methods in decision-theoretic terms. I have been urging that we consider instead how the tools may actually be used, and not be restricted by the statistical philosophies of founders (not to mention that so many of their statements are tied up with personality disputes, and problems of “anger management”). Recall, as well, E. Pearson’s insistence on an evidential construal of N-P methods, and the fact that Neyman, in practice, spoke of drawing inferences and reaching conclusions (e.g., Neyman’s nursery posts, links in [iii] below). Continue reading

Categories: Likelihood Principle, phil/history of stat, Statistics | Tags: | 3 Comments

Update on Higgs data analysis: statistical flukes (part 1)

physics pic yellow particle burst blue coneI am always impressed at how researchers flout the popular philosophical conception of scientists as being happy as clams when their theories are ‘born out’ by data, while terribly dismayed to find any anomalies that might demand “revolutionary science” (as Kuhn famously called it). Scientists, says Kuhn, are really only trained to do “normal science”—science within a paradigm of hard core theories that are almost never, ever to be questioned.[i] It is rather the opposite, and the reports out last week updating the Higgs data analysis reflect this yen to uncover radical anomalies from which scientists can push the boundaries of knowledge. While it is welcome news that the new data do not invalidate the earlier inference of a Higgs-like particle, many scientists are somewhat dismayed to learn that it appears to be quite in keeping with the Standard Model. In a March 15 article in National Geographic News:

Although a full picture of the Higgs boson has yet to emerge, some physicists have expressed disappointment that the new particle is so far behaving exactly as theory predicts. Continue reading

Categories: significance tests, Statistics | 30 Comments

R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’

17 February 1890–29 July 1962

I find this to be an intriguing discussion–before some of the conflicts with N and P erupted.  Fisher links his tests and sufficiency, to the Neyman and Pearson lemma in terms of power.  It’s as if we may see them as ending up in a similar place while starting from different origins. I quote just the most relevant portions…the full article is linked below.

by R.A. Fisher, F.R.S.

Proceedings of the Royal Society, Series A, 144: 285-307 (1934)

  The property that where a sufficient statistic exists, the likelihood, apart from a factor independent of the parameter to be estimated, is a function only of the parameter and the sufficient statistic, explains the principle result obtained by Neyman and Pearson in discussing the efficacy of tests of significance.  Neyman and Pearson introduce the notion that any chosen test of a hypothesis H0 is more powerful than any other equivalent test, with regard to an alternative hypothesis H1, when it rejects H0 in a set of samples having an assigned aggregate frequency ε when H0 is true, and the greatest possible aggregate frequency when H1 is true.

If any group of samples can be found within the region of rejection whose probability of occurrence on the hypothesis H1 is less than that of any other group of samples outside the region, but is not less on the hypothesis H0, then the test can evidently be made more powerful by substituting the one group for the other.

Consequently, for the most powerful test possible the ratio of the probabilities of occurrence on the hypothesis H0 to that on the hypothesis H1 is less in all samples in the region of rejection than in any sample outside it. For samples involving continuous variation the region of rejection will be bounded by contours for which this ratio is constant. The regions of rejection will then be required in which the likelihood of H0 bears to the likelihood of H1, a ratio less than some fixed value defining the contour. (295)…

It is evident, at once, that such a system is only possible when the class of hypotheses considered involves only a single parameter θ, or, what come to the same thing, when all the parameters entering into the specification of the population are definite functions of one of their number.  In this case, the regions defined by the uniformly most powerful test of significance are those defined by the estimate of maximum likelihood, T.  For the test to be uniformly most powerful, moreover, these regions must be independent of θ showing that the statistic must be of the special type distinguished as sufficient.  Such sufficient statistics have been shown to contain all the information which the sample provides relevant to the value of the appropriate parameter θ . It is inevitable therefore that if such a statistic exists it should uniquely define the contours best suited to discriminate among hypotheses differing only in respect of this parameter; and it is surprising that Neyman and Pearson should lay it down as a preliminary consideration that ‘the tesitng of statistical hypotheses cannot be treated as a problem in estimation.’ When tests are considered only in relation to sets of hypotheses specified by one or more variable parameters, the efficacy of the tests can be treated directly as the problem of estimation of these parameters.  Regard for what has been established in that theory, apart from the light it throws on the results already obtained by their own interesting line of approach, should also aid in treating the difficulties inherent in cases in which no sufficient statistics exists. (296)

Categories: phil/history of stat, Statistics | Tags: , , , | Leave a comment

R. A. Fisher: how an outsider revolutionized statistics

A SPANOSby Aris Spanos

Few statisticians will dispute that R. A. Fisher (February 17, 1890 – July 29, 1962) is the father of modern statistics; see Savage (1976), Rao (1992). Inspired by William Gosset’s (1908) paper on the Student’s t finite sampling distribution, he recast statistics into the modern model-based induction in a series of papers in the early 1920s. He put forward a theory of optimal estimation based on the method of maximum likelihood that has changed only marginally over the last century. His significance testing, spearheaded by the p-value, provided the basis for the Neyman-Pearson theory of optimal testing in the early 1930s. According to Hald (1998)

“Fisher was a genius who almost single-handedly created the foundations for modern statistical science, without detailed study of his predecessors. When young he was ignorant not only of the Continental contributions but even of contemporary publications in English.” (p. 738)

What is not so well known is that Fisher was the ultimate outsider when he brought about this change of paradigms in statistical science. As an undergraduate, he studied mathematics at Cambridge, and then did graduate work in statistical mechanics and quantum theory. His meager knowledge of statistics came from his study of astronomy; see Box (1978). That, however did not stop him from publishing his first paper in statistics in 1912 (still an undergraduate) on “curve fitting”, questioning Karl Pearson’s method of moments and proposing a new method that was eventually to become the likelihood method in his 1921 paper.

After graduating from Cambridge he drifted into a series of jobs, including subsistence farming and teaching high school mathematics and physics, until his temporary appointment as a statistician at Rothamsted Experimental Station in 1919. During the period 1912-1919 his interest in statistics was driven by his passion for eugenics and a realization that his mathematical knowledge of n-dimensional geometry can be put to good use in deriving finite sample distributions for estimators and tests in the spirit of Gosset’s (1908) paper. Encouraged by his early correspondence with Gosset, he derived the finite sampling distribution of the sample correlation coefficient which he published in 1915 in Biometrika; the only statistics journal at the time, edited by Karl Pearson. To put this result in a proper context, Pearson was working on this problem for two decades and published more than a dozen papers with several assistants on approximating the first two moments of the sample correlation coefficient; Fisher derived the relevant distribution, not just the first two moments.

Due to its importance, the 1915 paper provided Fisher’s first skirmish with the  ‘statistical establishment’. Karl Pearson would not accept being overrun by a ‘newcomer’ lightly. So, he prepared a critical paper with four of his assistants that became known as “the cooperative study”, questioning Fisher’s result as stemming from a misuse of Bayes theorem. He proceeded to publish it in Biometrika in 1917 without bothering to let Fisher know before publication. Fisher was furious at K.Pearson’s move and prepared his answer in a highly polemical style which Pearson promptly refused to publish in his journal. Eventually Fisher was able to publish his answer after tempering the style in Metron, a brand new statistics journal. As a result of this skirmish, Fisher pledged never to send another paper to Biometrika, and declared a war against K.Pearson’s perspective on statistics. Fisher, not only questioned his method of moments as giving rise to inefficient estimators, but also his derivation of the degrees of freedom of his chi-square test. Several, highly critical published papers ensued.[i]

Between 1922 and 1930 Fisher did most of his influential work in recasting statistics, including publishing a highly successful textbook in 1925, but the ‘statistical establishment’ kept him ‘in his place’; a statistician at an experimental station. All his attempts to find an academic position, including a position in Social Biology at the London School of Economics (LSE), were unsuccessful (see Box, 1978, p. 202). Being turned down for the LSE position was not unrelated to the fact that the professor of statistics at the LSE was Arthur Bowley (1869-1957); second only to Pearson in statistical high priesthood.[ii]

Coming of age as a statistician during the 1920s in England, was being awarded the Guy medal in gold, silver or bronze, or at least receiving an invitation to present your work to the Royal Statistical Society (RSS). Despite his fundamental contributions to the field, Fisher’s invitation to RSS would not come until 1934. To put that in perspective, Jerzy Neyman, his junior by some distance, was invited six months earlier! Indeed, one can make a strong case that the statistical establishment kept Fisher away for as long as they could get away with it. However, by 1933 they must have felt that they had to invite Fisher after he accepted a professorship at University College, London. The position was created after Karl Pearson retired and the College decided to split his chair into a statistics position that went to Egon Pearson (Pearson’s son) and a Galton professorship in Eugenics that was offered to Fisher. To make it worse, Fisher’s offer came with a humiliating clause that he was forbidden to teach statistics at University College (see Box, 1978, p. 258); the father of modern statistics was explicitly told to keep his views on statistics to himself!

Fisher’s presentation to the Royal Statistical Society, on December 18th, 1934, entitled “The Logic of Inductive Inference”, was an attempt to summarize and explain his published work on recasting the problem of statistical induction since his classic 1922 paper. Bowley was (self?) appointed to move the traditional vote of thanks and open the discussion. After some begrudging thanks for Fisher’s ‘contributions to statistics in general’, he went on to disparage his new approach to statistical inference based on the likelihood function by describing it as abstruse, arbitrary and misleading. His comments were predominantly sarcastic and discourteous, and went as far as to accuse Fisher of plagiarism, by not acknowledging Edgeworth’s priority on the likelihood function idea (see Fisher, 1935, pp. 55-7). The litany of churlish comments continued with the rest of the old guard: Isserlis, Irwin and the philosopher Wolf (1935, pp. 57-64), who was brought in by Bowley to undermine Fisher’s philosophical discussion on induction. Jeffreys complained about Fisher’s criticisms of the Bayesian approach (1935, pp. 70-2).

To Fisher’s support came … Egon Pearson, Neyman and Bartlett. E. Pearson argued that:

“When these ideas [on statistical induction] were fully understood … it would be realized that statistical science owed a very great deal to the stimulus Professor Fisher had provided in many directions.” (Fisher, 1935, pp. 64-5)

Neyman too came to Fisher’s support, praising Fisher’s path-breaking contributions, and explaining Bowley’s reaction to Fisher’s critical review of the traditional view of statistics as an understandable attachment to old ideas (1935, p. 73).

Fisher, in his reply to Bowley and the old guard, was equally contemptuous:

“The acerbity, to use no stronger term, with which the customary vote of thanks has been moved and seconded … does not, I confess, surprise me. From the fact that thirteen years have elapsed between the publication, by the Royal Society, of my first rough outline of the developments, which are the subject of to-day’s discussion, and the occurrence of that discussion itself, it is a fair inference that some at least of the Society’s authorities on matters theoretical viewed these developments with disfavour, and admitted with reluctance. … However true it may be that Professor Bowley is left very much where he was, the quotations show at least that Dr. Neyman and myself have not been left in his company. … For the rest, I find that Professor Bowley is offended with me for “introducing misleading ideas”. He does not, however, find it necessary to demonstrate that any such idea is, in fact, misleading. It must be inferred that my real crime, in the eyes of his academic eminence, must be that of “introducing ideas”. (Fisher, 1935, pp. 76-82)[iii]

In summary, the pioneering work of Fisher and later supplemented by Egon Pearson and Neyman, was largely ignored by the Royal Statistical Society (RSS) establishment until the early 1930s. By 1933 it was difficult to ignore their contributions, published primarily in other journals, and the ‘establishment’ of the RSS decided to display its tolerance to their work by creating ‘the Industrial and Agricultural Research Section’, under the auspices of which both papers by Neyman and Fisher were presented in 1934 and 1935, respectively. [iv]

In 1943, Fisher was offered the Balfour Chair of Genetics at the University of Cambridge. Recognition from the RSS came in 1946 with the Guy medal in gold, and he became its president in 1952-1954, just after he was knighted! Sir Ronald Fisher retired from Cambridge in 1957. The father of modern statistics never held an academic position in statistics!

References

Bowley, A. L. (1902, 1920, 1926, 1937) Elements of Statistics, 2nd, 4th, 5th and 6th editions, Staples Press, London.

Box, J. F. (1978) The Life of a Scientist: R. A. Fisher, Wiley, NY.

Fisher, R. A. (1912), “On an Absolute Criterion for Fitting Frequency Curves,” Messenger of Mathematics, 41, 155-160.

Fisher, R. A. (1915) “Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population,” Biometrika, 10, 507-21.

Fisher, R. A. (1921) “On the ‘probable error’ of a coefficient deduced from a small sample,” Metron 1, 2-32.

Fisher, R. A. (1922) “On the mathematical foundations of theoretical statistics,” Philosophical Transactions of the Royal Society, A 222, 309-68.

Fisher, R. A. (1922a) “On the interpretation of c2 from contingency tables, and the calculation of p, “Journal of the Royal Statistical Society 85, 87–94.

Fisher, R. A. (1922b) “The goodness of fit of regression formulae and the distribution of regression coefficients,”  Journal of the Royal Statistical Society, 85, 597–612.

Fisher, R. A. (1924) “The conditions under which the x2 measures the discrepancy between observation and hypothesis,” Journal of the Royal Statistical Society, 87, 442-450.

Fisher, R. A. (1925) Statistical Methods for Research Workers, Oliver & Boyd, Edinburgh.

Fisher, R. A. (1935) “The logic of inductive inference,” Journal of the Royal Statistical Society 98, 39-54, discussion 55-82.

Fisher, R. A. (1937), “Professor Karl Pearson and the Method of Moments,” Annals of Eugenics, 7, 303-318.

Gossett, W. S. (1908) “The probable error of the mean,” Biometrika, 6, 1-25.

Hald, A. (1998) A History of Mathematical Statistics from 1750 to 1930, Wiley, NY.

Hotelling, H. (1930) “British statistics and statisticians today,” Journal of the American Statistical Association, 25, 186-90.

Neyman, J. (1934) “On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection,” Journal of the Royal Statistical Society, 97, 558-625.

Rao, C. R. (1992), “ R. A. Fisher: The Founder of Modern, Statistical Science, 7, 34-48.

RSS (Royal Statistical Society) (1934) Annals of the Royal Statistical Society 1834-1934, The Royal Statistical Society, London.

Savage, L . J. (1976) “On re-reading R. A. Fisher,” Annals of Statistics, 4, 441-500.

Spanos, A. (2008), “Statistics and Economics,” pp. 1057-1097 in The New Palgrave Dictionary of Economics, Second Edition. Eds. Steven N. Durlauf and Lawrence E. Blume, Palgrave Macmillan.

Tippet, L. H. C. (1931) The Methods of Statistics, Williams & Norgate, London.


[i] Fisher (1937), published a year after Pearson’s death, is particularly acerbic. In Fisher’s mind, Karl Pearson went after a young Indian statistician – totally unfairly – just the way he went after him in 1917.

[ii] Bowley received the Guy Medal in silver from the Royal Statistical Society (RSS) as early as 1895, and became a member of the Council of the RSS in 1898. He was awarded the society’s highest honor, the Guy Medal in gold, in 1935.

[iii] It is important to note that Bowley revised his textbook in statistics for the last time in 1937, and predictably, he missed the whole change of paradigms brought about by Fisher, Neyman and Pearson.

[iv] In their centennial volume published in 1934, the RSS acknowledged the development of ‘mathematical statistics’, referring to Galton, Edgeworth, Karl Pearson, Yule and Bowley as the main pioneers, and listed the most important contributions in this sub-field which appeared in its Journal during the period 1909-33, but the three important papers by Fisher (1922a-b; 1924) are conspicuously absent from that list. The list itself is dominated by contributions in vital, commercial, financial and labour statistics (see RSS, 1934, pp. 208-23). There is a single reference to Egon Pearson.

Categories: phil/history of stat, Statistics | 12 Comments

Fisher and Neyman after anger management?

Photo on 2-15-13 at 11.47 PM

Would you agree if your (senior) colleague urged you to use his/her book rather than your own –even if you thought doing so would change for the positive the entire history of your field? My guess is that the answer is no (see “add on”). For that matter, would you ever try to insist that your (junior) colleague use your book in teaching a course rather than his/her own notes or book?  Again I guess no. But perhaps you’d be more tactful than were Fisher and Neyman. It wasn’t just Fisher (whose birthday is tomorrow) who seemed to need some anger management training, Erich Lehmann (in conversation and in 2011) points to a number of incidences wherein Neyman is the instigator of gratuitous ill-will. Their substantive statistical and philosophical disagreements, I now think, were minuscule in comparison to the huge animosity that developed over many years. Here’s how Neyman describes a vivid recollection he has of the 1935 book episode to Constance Reid (1998, 126). [i]

A couple of months “after Neyman criticized Fisher’s concept of the complex experiment” Neyman vividly recollects  Fisher stopping by his office at University College on his way to a meeting which was to decide on Neyman’s reappointment[ii]:

“And he said to me that he and I are in the same building… . That, as I know, he has published a book—and that’s Statistical Methods for Research Workers—and he is upstairs from me so he knows something about my lectures—that from time to time I mention his ideas, this and that—and that this would be quite appropriate if I were not here in the College but, say, in California—but if I am going to be at University College, this this is not acceptable to him. And then I said, ‘Do you mean that if I am here, I should just lecture using your book?’ And then he gave an affirmative answer. And I said, ‘Sorry, no. I cannot promise that.’ And then he said, ‘Well, if so, then from now on I shall oppose you in all my capacities.’ And then he enumerated—member of the Royal Society and so forth. There were quite a few. Then he left. Banged the door.”

Imagine if Neyman had replied:

“I’d be very pleased to use Statistical Methods for Research Workers in my class, what else?”

Or what if Fisher had said:

 “Of course you’ll want to use your own notes in your class, but I hope you will use a portion of my text when mentioning some of its key ideas.”

Very unlikely [iii].

How would you have handled it?

Ironically, Neyman did something very similar to Erich Lehmann at Berkeley, and blocked his teaching graduate statistics after one attempt that may have veered slightly off Neyman’s path. But Lehmann always emphasized that, unlike Fisher, Neyman never created professional obstacles for him. [iv]

“add on”: From the earlier discussion, I realized a needed qualification:the answer would have to depend on whether your ideas on the subject were substantially different from the colleague’s. For instance if Neyman were being asked by Lindley, it would be very different.

[i] At the meeting that followed this exchange, Fisher tried to shoot down Neyman’s reappointment, but did not succeed (Reid, 125).

[ii]This is Neyman’s narrative to Reid. I’m sure Fisher would relate these same episodes differently. Let me know if you have any historical material to add. I met Lehmann for the first time shortly after he had worked with Reid on her book, and he had lots of stories. I should have written them all down at the time.

[iii] I find it hard to believe, however, that Fisher would have thrown some of Neyman’s wooden models onto the floor:

“ After the Royal Statistical Society meeting of March 28, relations between workers on the two floors of K.P.’s old preserve became openly hostile. One evening, late that spring, Neyman and Pearson returned to their department after dinner to do some work. Entering they were startled to find strewn on the floor the wooden models which Neyman had used to illustrate his talk on the relative advantages of randomized blocks and Latin squares. They were regularly kept in a cupboard in the laboratory. Both Neyman and Pearson always believed that the models were removed by Fisher in a fit anger.” (Reid 124, noted in Lehmann 2011, p. 59. K.P. is, of course, Karl Pearson.)

[iv] I didn’t want to relate this anecdote without a citation, and finally found one in Reid (215-16). Actually I would have anyway, since Lehmann separately told it to Spanos and me.

Lehmann, E. (2011). Fisher, Neyman and the Creation of Classical Statistics, Springer.

Reid, C (1998), Neyman., Springer

Categories: phil/history of stat, Statistics | 21 Comments

Statistics as a Counter to Heavyweights…who wrote this?

questionmark pinkWhen any scientific conclusion is supposed to be [shown or disproved] on experimental evidence [or data], critics who still refuse to accept the conclusion are accustomed to take one of two lines of attack. They may claim that the interpretation of the [data] is faulty, that the results reported are not in fact those which should have been expected had the conclusion drawn been justified, or that they might equally well have arisen had the conclusion drawn been false. Such criticisms of interpretation are usually treated as falling within the domain of statistics. They are often made by professed statisticians against the work of others whom they regard as ignorant of or incompetent in statistical technique; and, since the interpretation of any considerable body of data is likely to involve computations it is natural enough that questions involving the logical implications of the results of the arithmetical processes implied should be relegated to the statistician. At least I make no complaint of this convention. The statistician cannot evade the responsibility for understanding the processes he applies or recommends. My immediate point is that the questions involved can be dissociated from all that is strictly technical in the statistician’s craft…..

The other type of criticism to which experimental results [or data] are exposed is that the experiment itself was ill designed or, of course, badly executed….This type of criticism is usually made by what I might call a heavyweight authority. Prolonged experience, or at least the long possession of a scientific reputation, is almost a pre-requisite for developing successfully this line of attack. Technical details are seldom in evidence. The authoritative assertion “His controls are totally inadequate” must have temporarily discredited many a promising line of work; and such an authoritarian method of judgment must surely continue, human nature being what it is, so long as [general methods for data generation, modeling and analysis] are lacking…

[T]he subject matter [of this work] has been regarded from the point of view of an experimenter [or data analyst], who wishes to carry out his work competently, and having done so wishes to safeguard his results, so far as they are validly established, from ignorant criticism by different sorts of superior persons.

Categories: phil/history of stat, Statistics | 9 Comments

13 well-worn criticisms of significance tests (and how to avoid them)

IMG_12432013 is right around the corner, and here are 13 well-known criticisms of statistical significance tests, and how they are addressed within the error statistical philosophy, as discussed in Mayo, D. G. and Spanos, A. (2011) “Error Statistics“.

  •  (#1) error statistical tools forbid using any background knowledge.
  •  (#2) All statistically significant results are treated the same.
  • (#3) The p-value does not tell us how large a discrepancy is found.
  • (#4) With large enough sample size even a trivially small discrepancy from the null can be detected.
  •  (#5) Whether there is a statistically significant difference from the null depends on which is the null and which is the alternative.
  • (#6) Statistically insignificant results are taken as evidence that the null hypothesis is true.
  • (#7) Error probabilities are misinterpreted as posterior probabilities.
  • (#8) Error statistical tests are justified only in cases where there is a very long (if not infinite) series of repetitions of the same experiment.
  • (#9) Specifying statistical tests is too arbitrary.
  • (#10) We should be doing confidence interval estimation rather than significance tests.
  • (#11) Error statistical methods take into account the intentions of the scientists analyzing the data.
  • (#12) All models are false anyway.
  • (#13) Testing assumptions involves illicit data-mining.

You can read how we avoid them in the full paper here.

Mayo, D. G. and Spanos, A. (2011) “Error Statistics” in Philosophy of Statistics , Handbook of Philosophy of Science Volume 7 Philosophy of Statistics, (General editors: Dov M. Gabbay, Paul Thagard and John Woods; Volume eds. Prasanta S. Bandyopadhyay and Malcolm R. Forster.) Elsevier: 1-46.

Categories: Error Statistics, significance tests, Statistics | Tags: | Leave a comment

“Bad statistics”: crime or free speech?

wavy-capital3Hunting for “nominally” significant differences, trying different subgroups and multiple endpoints, can result in a much higher probability of erroneously inferring evidence of a risk or benefit than the nominal p-value, even in randomized controlled trials. This was an issue that arose in looking at RCTs in development economics (an area introduced to me by Nancy Cartwright), as at our symposium at the Philosophy of Science Association last month[i][ii]. Reporting the results of hunting and dredging in just the same way as if the relevant claims were predesignated can lead to misleading reports of actual significance levels.[iii]

Still, even if reporting spurious statistical results is considered “bad statistics,” is it criminal behavior? I noticed this issue in Nathan Schachtman’s blog over the past couple of days. The case concerns a biotech company, InterMune, and its previous CEO, Dr. Harkonen. Here’s an excerpt from Schachtman’s discussion (part 1). Continue reading

Categories: PhilStatLaw, significance tests, spurious p values, Statistics | 27 Comments

Bad news bears: ‘Bayesian bear’ rejoinder- reblog

To my dismay, I’ve been sent, once again, that silly, snarky, adolescent, clip of those naughty “what the p-value” bears (see Aug 5 post),, who cannot seem to get a proper understanding of significance tests into their little bear brains. So apparently some people haven’t  seen my rejoinder which, as I said then, practically wrote itself. So since it’s Saturday night here at the Elbar Room, let’s listen in to a reblog of my rejoinder (replacing p-value bears with hypothetical Bayesian bears)–but you can’t get it without first watching the Aug 5 post, since I’m mimicking them.  [My idea for the rejoinder was never polished up for actually making a clip.  In fact the original post had 16 comments where several reader improvements were suggested. Maybe someone will want to follow through*.] I just noticed a funny cartoon on Bayesian intervals on Normal Deviate’s post from Nov. 9.

This continues yesterday’s post: I checked out the the” xtranormal” http://www.xtranormal.com/ website. Turns out there are other figures aside from the bears that one may hire out, but they pronounce “Bayesian” as an unrecognizable, foreign-sounding word with around five syllables. Anyway, before taking the plunge, here is my first attempt, just off the top of my head. Please send corrections and additions.

Bear #1: Do you have the results of the study?

Bear #2:Yes. The good news is there is a .996 probability of a positive difference in the main comparison.

Bear #1: Great. So I can be well assured that there is just a .004 probability that such positive results would occur if they were merely due to chance.

Bear #2: Not really, that would be an incorrect interpretation. Continue reading

Categories: Comedy, Metablog, significance tests, Statistics | Tags: , , | 42 Comments

Letter from George (Barnard)

George Barnard sent me this note on hearing of my Lakatos Prize. He was to have been at my Lakatos dinner at the LSE (March 17, 1999)—winners are permitted to invite ~2-3 guests—but he called me at the LSE at the last minute to say he was too ill to come to London.  Instead we had a long talk on the phone the next day, which I can discuss at some point.

Categories: phil/history of stat | Tags: , | Leave a comment

Blog at WordPress.com.