PhilStat/Law: Nathan Schachtman: Acknowledging Multiple Comparisons in Statistical Analysis: Courts Can and Must

NAS-3

.

The following is from Nathan Schachtman’s legal blog, with various comments and added emphases (by me, in this color). He will try to reply to comments/queries.

“Courts Can and Must Acknowledge Multiple Comparisons in Statistical Analyses”

Nathan Schachtman, Esq., PC * October 14th, 2014

In excluding the proffered testimony of Dr. Anick Bérard, a Canadian perinatal epidemiologist in the Université de Montréal, the Zoloft MDL trial court discussed several methodological shortcomings and failures, including Bérard’s reliance upon claims of statistical significance from studies that conducted dozens and hundreds of multiple comparisons.[i] The Zoloft MDL court was not the first court to recognize the problem of over-interpreting the putative statistical significance of results that were one among many statistical tests in a single study. The court was, however, among a fairly small group of judges who have shown the needed statistical acumen in looking beyond the reported p-value or confidence interval to the actual methods used in a study[1].

.

.

A complete and fair evaluation of the evidence in situations as occurred in the Zoloft birth defects epidemiology required more than the presentation of the size of the random error, or the width of the 95 percent confidence interval.  When the sample estimate arises from a study with multiple testing, presenting the sample estimate with the confidence interval, or p-value, can be highly misleading if the p-value is used for hypothesis testing.  The fact of multiple testing will inflate the false-positive error rate. Dr. Bérard ignored the context of the studies she relied upon. What was noteworthy is that Bérard encountered a federal judge who adhered to the assigned task of evaluating methodology and its relationship with conclusions.

*   *   *   *   *   *   *

There is no unique solution to the problem of multiple comparisons. Some researchers use Bonferroni or other quantitative adjustments to p-values or confidence intervals, whereas others reject adjustments in favor of qualitative assessments of the data in the full context of the study and its methods. See, e.g., Kenneth J. Rothman, “No Adjustments Are Needed For Multiple Comparisons,” 1 Epidemiology 43 (1990) (arguing that adjustments mechanize and trivialize the problem of interpreting multiple comparisons). Two things are clear from Professor Rothman’s analysis. First for someone intent upon strict statistical significance testing, the presence of multiple comparisons means that the rejection of the null hypothesis cannot be done without further consideration of the nature and extent of both the disclosed and undisclosed statistical testing. Rothman, of course, has inveighed against strict significance testing under any circumstance, but the multiple testing would only compound the problem.

Second, although failure to adjust p-values or intervals quantitatively may be acceptable, failure to acknowledge the multiple testing is poor statistical practice. The practice is, alas, too prevalent for anyone to say that ignoring multiple testing is fraudulent, and the Zoloft MDL court certainly did not condemn Dr. Bérard as a fraudfeasor[2]. [emphasis mine]

I’m perplexed by this mixture of stances. If you don’t mention the multiple testing for which it is acceptable not to adjust, then you’re guilty of poor statistical practice; but its “too prevalent for anyone to say that ignoring multiple testing is fraudulent”. This appears to claim it’s poor statistical practice if you fail to mention your results are due to multiple testing, but “ignoring multiple testing” (which could mean failing to adjust or, more likely, failing to mention it) is not fraudulent. Perhaps, it’s a questionable research practice QRP. It’s back to “50 shades of grey between QRPs and fraud.”

  […read his full blogpost here]

Previous cases have also acknowledged the multiple testing problem. In litigation claims for compensation for brain tumors for cell phone use, plaintiffs’ expert witness relied upon subgroup analysis, which added to the number of tests conducted within the epidemiologic study at issue. Newman v. Motorola, Inc., 218 F. Supp. 2d 769, 779 (D. Md. 2002), aff’d, 78 Fed. App’x 292 (4th Cir. 2003). The trial court explained:

“[Plaintiff’s expert] puts overdue emphasis on the positive findings for isolated subgroups of tumors. As Dr. Stampfer explained, it is not good scientific methodology to highlight certain elevated subgroups as significant findings without having earlier enunciated a hypothesis to look for or explain particular patterns, such as dose-response effect. In addition, when there is a high number of subgroup comparisons, at least some will show a statistical significance by chance alone.”

I’m going to require, as part of its meaning, that a statistically significant difference not be one due to “chance variability” alone. Then to avoid self contradiction, this last sentence might be put as follows: “when there is a high number of subgroup comparisons, at least some will show purported or nominal or unaudited statistical significance by chance alone. [Which term do readers prefer?] If one hunts down one’s hypothesized comparison in the data, then the actual p-value will not equal, and will generally be greater than, the nominal or unaudited p-value.”

So, I will insert “nominal” where needed below (in red).

Texas Sharpshooter fallacy

Id. And shortly after the Supreme Court decided Daubert, the Tenth Circuit faced the reality of data dredging in litigation, and its effect on the meaning of “significance”:

“Even if the elevated levels of lung cancer for men had been [nominally] statistically significant a court might well take account of the statistical “Texas Sharpshooter” fallacy in which a person shoots bullets at the side of a barn, then, after the fact, finds a cluster of holes and draws a circle around it to show how accurate his aim was. With eight kinds of cancer for each sex there would be sixteen potential categories here around which to “draw a circle” to show a [nominally] statistically significant level of cancer. With independent variables one would expect one statistically significant reading in every twenty categories at a 95% confidence level purely by random chance.”

The Texas sharpshooter fallacy is one of my all time favorites. One purports to be testing the accuracy of his aim, when in fact that is not the process that gave rise to the impressive-looking (nominal) cluster of hits. The results do not warrant inferences about his ability to accurately hit a target, since that hasn’t been well-probed. Continue reading

Categories: P-values, PhilStat Law, Statistics | 12 Comments

Gelman recognizes his error-statistical (Bayesian) foundations

karma

From Gelman’s blog:

“In one of life’s horrible ironies, I wrote a paper “Why we (usually) don’t have to worry about multiple comparisons” but now I spend lots of time worrying about multiple comparisons”

Posted by  on

Exhibit A: [2012] Why we (usually) don’t have to worry about multiple comparisons. Journal of Research on Educational Effectiveness 5, 189-211. (Andrew Gelman, Jennifer Hill, and Masanao Yajima)

Exhibit B: The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time, in press. (Andrew Gelman and Eric Loken) (Shortened version is here.)

 

The “forking paths” paper, in my reading,  basically argues that mere hypothetical possibilities about what you would or might have done had the data been different (in order to secure a desired interpretation) suffices to alter the characteristics of the analysis you actually did. That’s an error statistical argument–maybe even stronger than what some error statisticians would say. What’s really being condemned are overly flexible ways to move from statistical results to substantive claims. The p-values are illicit when taken to provide evidence for those claims because an actual p-value requires Prob(P < p;Ho) = p (and the actual p-value has become much greater by design). The criticism makes perfect sense if you’re scrutinizing inferences according to how well or severely tested they are. Actual error probabilities are accordingly altered or unable to be calculated. However, if one is going to scrutinize inferences according to severity then the same problematic flexibility would apply to Bayesian analyses, whether or not they have a way to pick up on it. (It’s problematic if they don’t.) I don’t see the magic by which a concern for multiple testing disappears in Bayesian analysis (e.g., in the first paper) except by assuming some prior takes care of it..

Categories: Error Statistics, Gelman | 15 Comments

BREAKING THE (Royall) LAW! (of likelihood) (C)

IMG_1734

.

With this post, I finally get back to the promised sequel to “Breaking the Law! (of likelihood) (A) and (B)” from a few weeks ago. You might wish to read that one first.* A relevant paper by Royall is here.

Richard Royall is a statistician1 who has had a deep impact on recent philosophy of statistics by giving a neat proposal that appears to settle disagreements about statistical philosophy! He distinguishes three questions:

  • What should I believe?
  • How should I act?
  • Is this data evidence of some claim? (or How should I interpret this body of observations as evidence?)

It all sounds quite sensible– at first–and, impressively, many statisticians and philosophers of different persuasions have bought into it. At least they appear willing to go this far with him on the 3 questions.

How is each question to be answered? According to Royall’s commandments writings, what to believe is captured by Bayesian posteriors; how to act, by a behavioristic, N-P long-run performance. And what method answers the evidential question? A comparative likelihood approach. You may want to reject all of them (as I do),2 but just focus on the last.

Remember with likelihoods, the data x are fixed, the hypotheses vary. A great many critical discussions of frequentist error statistical inference (significance tests, confidence intervals, p- values, power, etc.) start with “the law”. But I fail to see why we should obey it.

To begin with, a report of comparative likelihoods isn’t very useful: H might be less likely than H’, given x, but so what? What do I do with that information? It doesn’t tell me I have evidence against or for either.3 Recall, as well, Hacking’s points here about the variability in the meanings of a likelihood ratio across problems. Continue reading

Categories: law of likelihood, Richard Royall, Statistics | 41 Comments

A (Jan 14, 2014) interview with Sir David Cox by “Statistics Views”

Sir David Cox

Sir David Cox

The original Statistics Views interview is here:

“I would like to think of myself as a scientist, who happens largely to specialise in the use of statistics”– An interview with Sir David Cox

FEATURES

  • Author: Statistics Views
  • Date: 24 Jan 2014
  • Copyright: Image appears courtesy of Sir David Cox

Sir David Cox is arguably one of the world’s leading living statisticians. He has made pioneering and important contributions to numerous areas of statistics and applied probability over the years, of which perhaps the best known is the proportional hazards model, which is widely used in the analysis of survival data. The Cox point process was named after him.

Sir David studied mathematics at St John’s College, Cambridge and obtained his PhD from the University of Leeds in 1949. He was employed from 1944 to 1946 at the Royal Aircraft Establishment, from 1946 to 1950 at the Wool Industries Research Association in Leeds, and from 1950 to 1955 worked at the Statistical Laboratory at the University of Cambridge. From 1956 to 1966 he was Reader and then Professor of Statistics at Birkbeck College, London. In 1966, he took up the Chair position in Statistics at Imperial College Londonwhere he later became Head of the Department of Mathematics for a period. In 1988 he became Warden of Nuffield College and was a member of the Department of Statistics at Oxford University. He formally retired from these positions in 1994 but continues to work in Oxford.

Sir David has received numerous awards and honours over the years. He has been awarded the Guy Medals in Silver (1961) and Gold (1973) by the Royal Statistical Society. He was elected Fellow of the Royal Society of London in 1973, was knighted in 1985 and became an Honorary Fellow of the British Academy in 2000. He is a Foreign Associate of the US National Academy of Sciences and a foreign member of the Royal Danish Academy of Sciences and Letters. In 1990 he won the Kettering Prize and Gold Medal for Cancer Research for “the development of the Proportional Hazard Regression Model” and 2010 he was awarded the Copley Medal by the Royal Society.

He has supervised and collaborated with many students over the years, many of whom are now successful in statistics in their own right such as David Hinkley and Past President of the Royal Statistical Society, Valerie Isham. Sir David has served as President of theBernoulli Society, Royal Statistical Society, and the International Statistical Institute.

This year, Sir David is to turn 90*. Here Statistics Views talks to Sir David about his prestigious career in statistics, working with the late Professor Lindley, his thoughts on Jeffreys and Fisher, being President of the Royal Statistical Society during the Thatcher Years, Big Data and the best time of day to think of statistical methods.

1. With an educational background in mathematics at St Johns College, Cambridge and the University of Leeds, when and how did you first become aware of statistics as a discipline?

I was studying at Cambridge during the Second World War and after two years, one was sent either into the Forces or into some kind of military research establishment. There were very few statisticians then, although it was realised there was a need for statisticians. It was assumed that anybody who was doing reasonably well at mathematics could pick up statistics in a week or so! So, aged 20, I went to the Royal Aircraft Establishment in Farnborough, which is enormous and still there to this day if in a different form, and I worked in the Department of Structural and Mechanical Engineering, doing statistical work. So statistics was forced upon me, so to speak, as was the case for many mathematicians at the time because, aside from UCL, there had been very little teaching of statistics in British universities before the Second World War. Afterwards, it all started to expand.

2. From 1944 to 1946 you worked at the Royal Aircraft Establishment and then from 1946 to 1950 at the Wool Industries Research Association in Leeds. Did statistics have any role to play in your first roles out of university?

Totally. In Leeds, it was largely statistics but also to some extent, applied mathematics because there were all sorts of problems connected with the wool and textile industry in terms of the physics, chemistry and biology of the wool and some of these problems were mathematical but the great majority had a statistical component to them. That experience was not totally uncommon at the time and many who became academic statisticians had, in fact, spent several years working in a research institute first.

3. From 1950 to 1955, you worked at the Statistical Laboratory at Cambridge and would have been there at the same time as Fisher and Jeffreys. The late Professor Dennis Lindley, who was also there at that time, told me that the best people working on statistics were not in the statistics department at that time. What are your memories when you look back on that time and what do you feel were your main achievements?

Lindley was exactly right about Jeffreys and Fisher. They were two great scientists outside statistics – Jeffreys founded modern geophysics and Fisher was a major figure in genetics. Dennis was a contemporary and very impressive and effective. We were colleagues for five years and our children even played together.

The first lectures on statistics I attended as a student consisted of a short course by Harold Jeffreys who had at the time a massive reputation as virtually the inventor of modern geophysics. His Theory of Probability, published first as a monograph in physics was and remains of great importance but, amongst other things, his nervousness limited the appeal of his lectures, to put it gently. I met him personally a couple of times – he was friendly but uncommunicative. When I was later at the Statistical Laboratory in Cambridge, relations between the Director, Dr Wishart and R.A. Fisher had been at a very low ebb for 20 years and contact between the Lab and Fisher was minimal. I hear him speak on three of four occasions, interesting if often rambunctious occasions. To some, Fisher showed great generosity but not to the Statistics Lab, which was sad in view of the towering importance of his work.

“To some, Fisher showed great generosity but not to the Statistics Lab, which was sad in view of the towering importance of his work.”

Continue reading

Categories: Sir David Cox | 3 Comments

Diederik Stapel hired to teach “social philosophy” because students got tired of success stories… or something (rejected post)

Oh My*.images-16

(“But I can succeed as a social philosopher”)

The following is from Retraction Watch. UPDATE: OCT 10, 2014**

Diederik Stapel, the Dutch social psychologist and admitted data fabricator — and owner of 54 retraction notices — is now teaching at a college in the town of Tilburg [i].

According to Omroep Brabant, Stapel was offered the job as a kind of adjunct at Fontys Academy for Creative Industries to teach social philosophy. The site quotes a Nick Welman explaining the rationale for hiring Stapel (per Google Translate):

“It came about because students one after another success story were told from the entertainment industry, the industry which we educate them .”

The students wanted something different.

“They wanted to also focus on careers that have failed. On people who have fallen into a black hole, acquainted with the dark side of fame and success.”

Last month, organizers of a drama festival in The Netherlands cancelled a play co-written by Stapel.

I really think Dean Bon puts the rationale most clearly of all.

…A letter from the school’s dean, Pieter Bon, adds:

We like to be entertained and the length of our lives increases. We seek new ways in which to improve our health and we constantly look for new ways to fill our free time. Fashion and looks are important to us; we prefer sustainable products and we like to play games using smart gadgets. This is why Fontys Academy for Creative Industries exists. We train people to create beautiful concepts, exciting concepts, touching concepts, concepts to improve our quality of life. We train them for an industry in which creativity is of the highest value to a product or service. We educate young people who feel at home in the (digital) world of entertainment and lifestyle, and understand that creativity can also mean business. Creativity can be marketed, it’s as simple as that.

We’re sure Prof. Stapel would agree.

[i] Fontys describes itself thusly: Fontys Academy for Creative Industries (Fontys ACI) in Tilburg has 2500 students working towards a bachelor of Business Administration (International Event, Music & Entertainment Studies and Digital Publishing Studies), a bachelor of Communication (International Event, Music & Entertainment Studies) or a bachelor of Lifestyle (International Lifestyle Studies). Fontys ACI hosts a staff of approximately one hundred (teachers plus support staff) as well as about fifty regular visiting lecturers.

 *I wonder if “social philosophy” is being construed as “extreme postmodernist social epistemology”?  

I guess the students are keen to watch that Fictionfactory Peephole.

**Turns out to have been short-lived. Also admits to sockpuppeting at Retraction watch. Frankly I thought it was more fun to guess who “Paul” was, but they have rules. http://retractionwatch.com/2014/10/08/diederik-stapel-loses-teaching-post-admits-he-was-sockpuppeting-on-retraction-watch/#comments

[ii} One of my April Fool’s Day posts is turning from part fiction to fact.

Categories: Rejected Posts, Statistics | 9 Comments

Oy Faye! What are the odds of not conflating simple conditional probability and likelihood with Bayesian success stories?

Unknown

Faye Flam

Congratulations to Faye Flam for finally getting her article published at the Science Times at the New York Times, “The odds, continually updated” after months of reworking and editing, interviewing and reinterviewing. I’m grateful too, that one remark from me remained. Seriously I am. A few comments: The Monty Hall example is simple probability not statistics, and finding that fisherman who floated on his boots at best used likelihoods. I might note, too, that critiquing that ultra-silly example about ovulation and voting–a study so bad they actually had to pull it at CNN due to reader complaints[i]–scarcely required more than noticing the researchers didn’t even know the women were ovulating[ii]. Experimental design is an old area of statistics developed by frequentists; on the other hand, these ovulation researchers really believe their theory, so the posterior checks out.

The article says, Bayesian methods can “crosscheck work done with the more traditional or ‘classical’ approach.” Yes, but on traditional frequentist grounds. What many would like to know is how to cross check Bayesian methods—how do I test your beliefs? Anyway, I should stop kvetching and thank Faye and the NYT for doing the article at all[iii]. Here are some excerpts:

Statistics may not sound like the most heroic of pursuits. But if not for statisticians, a Long Island fisherman might have died in the Atlantic Ocean after falling off his boat early one morning last summer.

Continue reading

Categories: Bayesian/frequentist, Statistics | 47 Comments

Letter from George (Barnard)

Memory Lane: (2 yrs) Sept. 30, 2012George Barnard sent me this note on hearing of my Lakatos Prize. He was to have been at my Lakatos dinner at the LSE (March 17, 1999)—winners are permitted to invite ~2-3 guests*—but he called me at the LSE at the last minute to say he was too ill to come to London.  Instead we had a long talk on the phone the next day, which I can discuss at some point. The main topics were likelihoods and error probabilities. {I have 3 other Barnard letters, with real content, that I might dig out some time.)

*My others were Donald Gillies and Hasok Chang

Categories: Barnard, phil/history of stat | Tags: , | Leave a comment

Should a “Fictionfactory” peepshow be barred from a festival on “Truth and Reality”? Diederik Stapel says no (rejected post)

photo-on-9-17-14-at-9-49-pm1So I hear that Diederik Stapel is the co-author of a book Fictionfactory (in Dutch,with a novelist, Dautzenberg)[i], and of what they call their “Fictionfactory peepshow”, only it’s been disinvited at the last minute from a Dutch festival on“truth and reality” (due to have run 9/26/14), and all because of Stapel’s involvement. Here’s an excerpt from an article in last week’s Retraction Watch (article is here):*

Here’s a case of art imitating science.

The organizers of a Dutch drama festival have put a halt to a play about the disgraced social psychologist Diederik Stapel, prompting protests from the authors of the skit — one of whom is Stapel himself.

According to an article in NRC Handelsblad:

The Amsterdam Discovery Festival on science and art has canceled at the last minute, the play written by Anton Dautzenberg and former professor Diederik Stapel. Co-sponsor, The Royal Netherlands Academy of Arts and Sciences (KNAW), doesn’t want Stapel, who committed science fraud, to perform at a festival that’s associated with the KNAW.

FICTION FACTORY

The management of the festival, planned for September 26th at the Tolhuistuin in Amsterdam, contacted Stapel and Dautzenberg 4 months ago with the request to organize a performance of their book and lecture project ‘The Fictionfactory”. Especially for this festival they [Stapel and Dautzenberg] created a ‘Fictionfactory-peepshow’.

“Last Friday I received a call [from the management of the festival] that our performance has been canceled at the last minute because the KNAW will withdraw their subsidy if Stapel is on the festival program”, says Dautzenberg. “This looks like censorship, and by an institution that also wants to represents arts and experiments”.

Well this is curious, as things with Stapel always are. What’s the “Fichtionfactory Peepshow”? If you go to Stapel’s homepage, it’s all in Dutch, but Google translation isn’t too bad, and I have a pretty good description of the basic idea. So since it’s Saturday night,let’s take a peek, or peep (at what it might have been)…

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Here we are at the “Truth and Reality” Festival: first stop (after some cotton candy): the Stapel Fictionfactory Peepshow! It’s all dark, I can’t see a thing. What? It says I have to put some coins in a slot if I want to turn turn him it on (but that they also take credit cards). So I’m to look down in this tiny window. The curtains are opening!…I see a stage with two funky looking guys– one of them is Stapel. They’re reading, or reciting from some magazine with big letters: “Fact, Fiction and the Frictions we Hide”.

Stapel and Dautzenberg

Stapel and Dautzenberg

STAPEL: Welkom.You can ask us any questions! In response, you will always be given an option: ‘Do you want to know the truth or do you want to be comforted with fictions and feel-good fantasy?’

“Well I’ve brought some data with me from a study in social psychology. My question is this: “Is there a statistically significant effect here?”

STAPEL:Do you want to know the truth or do you want to be comforted with fictions and feel-good fantasy?

“Fiction please”.

STAPEL: I can massage your data, manipulate your numbers, reveal the taboos normally kept under wraps. For a few more coins I will let you see the secrets behind unreplicable results, and for a large bill will manufacture for you a sexy statistical story to turn on the editors.

(Then after the dirty business is all done [ii].)

STAPEL: Do you have more questions for me?

“Will it be published (fiction please)?”

STAPEL: “yes”

“will anyone find out about this (fiction please)?”

STAPEL: “No, I mean yes, I mean no.”

 

“I’d like to change to hearing the truth now. I have three questions”.

STAPEL: No problem, we take credit cards. Dank u. What are your questions?’

“Will Uri Simonsohn be able to fraudbust my results using the kind of tests he used on others? and if so, how long will it take him? (truth, please)?

STAPEL: “Yes.But not for at least 6 months to one year.”

“Here’s my final question. Are these data really statistically significant and at what level?” (truth please)

Nothing. Blank screen suddenly! With an acrid smelling puff of smoke, ew. But I’d already given the credit card! (Tricked by the master trickster).

 

What if he either always lies or always tells the truth? Then what would you ask him if you want to know the truth about your data? (Liar’s paradox variant)

Feel free to share your queries/comments.

* I thank Caitlin Parker for sending me the article

[i]Diederik Stapel was found guilty of science fraud in psychology in 2011, made up data out of whole cloth, retracted over 50 papers.. http://www.nytimes.com/2013/04/28/magazine/diederik-stapels-audacious-academic-fraud.html?pagewanted=all&_r=0

Bookjacket:

defictiefabriek_clip_image002

[ii] Perhaps they then ask you how much you’ll pay for a bar of soap (because you’d sullied yourself). Why let potential priming data go to waste?  Oh wait, he doesn’t use real data…. Perhaps the peepshow was supposed to be a kind of novel introduction to research ethics.

 

Some previous posts on Stapel:

 

Categories: Comedy, junk science, rejected post, Statistics | 5 Comments

G.A. Barnard: The Bayesian “catch-all” factor: probability vs likelihood

barnard-1979-picture

G. A. Barnard: 23 Sept 1915-30 July, 2002

Today is George Barnard’s birthday. In honor of this, I have typed in an exchange between Barnard, Savage (and others) on an important issue that we’d never gotten around to discussing explicitly (on likelihood vs probability). Please share your thoughts.

The exchange is from pp 79-84 (of what I call) “The Savage Forum” (Savage, 1962)[i]

 ♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠

BARNARD:…Professor Savage, as I understand him, said earlier that a difference between likelihoods and probabilities was that probabilities would normalize because they integrate to one, whereas likelihoods will not. Now probabilities integrate to one only if all possibilities are taken into account. This requires in its application to the probability of hypotheses that we should be in a position to enumerate all possible hypotheses which might explain a given set of data. Now I think it is just not true that we ever can enumerate all possible hypotheses. … If this is so we ought to allow that in addition to the hypotheses that we really consider we should allow something that we had not thought of yet, and of course as soon as we do this we lose the normalizing factor of the probability, and from that point of view probability has no advantage over likelihood. This is my general point, that I think while I agree with a lot of the technical points, I would prefer that this is talked about in terms of likelihood rather than probability. I should like to ask what Professor Savage thinks about that, whether he thinks that the necessity to enumerate hypotheses exhaustively, is important.

SAVAGE: Surely, as you say, we cannot always enumerate hypotheses so completely as we like to think. The list can, however, always be completed by tacking on a catch-all ‘something else’. In principle, a person will have probabilities given ‘something else’ just as he has probabilities given other hypotheses. In practice, the probability of a specified datum given ‘something else’ is likely to be particularly vague­–an unpleasant reality. The probability of ‘something else’ is also meaningful of course, and usually, though perhaps poorly defined, it is definitely very small. Looking at things this way, I do not find probabilities unnormalizable, certainly not altogether unnormalizable.

Whether probability has an advantage over likelihood seems to me like the question whether volts have an advantage over amperes. The meaninglessness of a norm for likelihood is for me a symptom of the great difference between likelihood and probability. Since you question that symptom, I shall mention one or two others. …

On the more general aspect of the enumeration of all possible hypotheses, I certainly agree that the danger of losing serendipity by binding oneself to an over-rigid model is one against which we cannot be too alert. We must not pretend to have enumerated all the hypotheses in some simple and artificial enumeration that actually excludes some of them. The list can however be completed, as I have said, by adding a general ‘something else’ hypothesis, and this will be quite workable, provided you can tell yourself in good faith that ‘something else’ is rather improbable. The ‘something else’ hypothesis does not seem to make it any more meaningful to use likelihood for probability than to use volts for amperes.

Let us consider an example. Off hand, one might think it quite an acceptable scientific question to ask, ‘What is the melting point of californium?’ Such a question is, in effect, a list of alternatives that pretends to be exhaustive. But, even specifying which isotope of californium is referred to and the pressure at which the melting point is wanted, there are alternatives that the question tends to hide. It is possible that californium sublimates without melting or that it behaves like glass. Who dare say what other alternatives might obtain? An attempt to measure the melting point of californium might, if we are serendipitous, lead to more or less evidence that the concept of melting point is not directly applicable to it. Whether this happens or not, Bayes’s theorem will yield a posterior probability distribution for the melting point given that there really is one, based on the corresponding prior conditional probability and on the likelihood of the observed reading of the thermometer as a function of each possible melting point. Neither the prior probability that there is no melting point, nor the likelihood for the observed reading as a function of hypotheses alternative to that of the existence of a melting point enter the calculation. The distinction between likelihood and probability seems clear in this problem, as in any other.

BARNARD: Professor Savage says in effect, ‘add at the bottom of list H1, H2,…”something else”’. But what is the probability that a penny comes up heads given the hypothesis ‘something else’. We do not know. What one requires for this purpose is not just that there should be some hypotheses, but that they should enable you to compute probabilities for the data, and that requires very well defined hypotheses. For the purpose of applications, I do not think it is enough to consider only the conditional posterior distributions mentioned by Professor Savage. Continue reading

Categories: Barnard, highly probable vs highly probed, phil/history of stat, Statistics | 26 Comments

Statistical Theater of the Absurd: “Stat on a Hot Tin Roof”

metablog old fashion typewriterMemory lane: Did you ever consider how some of the colorful exchanges among better-known names in statistical foundations could be the basis for high literary drama in the form of one-act plays (even if appreciated by only 3-7 people in the world)? (Think of the expressionist exchange between Bohr and Heisenberg in Michael Frayn’s play Copenhagen, except here there would be no attempt at all to popularize—only published quotes and closely remembered conversations would be included, with no attempt to create a “story line”.)  Somehow I didn’t think so. But rereading some of Savage’s high-flown praise of Birnbaum’s “breakthrough” argument (for the Likelihood Principle) today, I was swept into a “(statistical) theater of the absurd” mindset.

The first one came to me in autumn 2008 while I was giving a series of seminars on philosophy of statistics at the LSE. Modeled on a disappointing (to me) performance of The Woman in Black, “A Funny Thing Happened at the [1959] Savage Forum” relates Savage’s horror at George Barnard’s announcement of having rejected the Likelihood Principle!

The current piece also features George Barnard and since Monday (9/23) is Barnard’s birthday, I’m digging it out of “rejected posts” to reblog it. It recalls our first meeting in London in 1986. I’d sent him a draft of my paper “Why Pearson Rejected the Neyman-Pearson Theory of Statistics” (later adapted as chapter 11 of EGEK) to see whether I’d gotten Pearson right. He’d traveled quite a ways, from Colchester, I think. It was June and hot, and we were up on some kind of a semi-enclosed rooftop. Barnard was sitting across from me looking rather bemused.Barnard-1979-picture

The curtain opens with Barnard and Mayo on the roof, lit by a spot mid-stage. He’s drinking (hot) tea; she, a Diet Coke. The dialogue (is what I recall from the time[i]):

 Barnard: I read your paper. I think it is quite good.  Did you know that it was I who told Fisher that Neyman-Pearson statistics had turned his significance tests into little more than acceptance procedures?

Mayo:  Thank you so much for reading my paper.  I recall a reference to you in Pearson’s response to Fisher, but I didn’t know the full extent.

Barnard: I was the one who told Fisher that Neyman was largely to blame. He shouldn’t be too hard on Egon.  His statistical philosophy, you are aware, was different from Neyman’s.

Mayo:  That’s interesting.  I did quote Pearson, at the end of his response to Fisher, as saying that inductive behavior was “Neyman’s field, not mine”.  I didn’t know your role in his laying the blame on Neyman!

Fade to black. The lights go up on Fisher, stage left, flashing back some 30 years earlier . . . ….

Fisher: Now, acceptance procedures are of great importance in the modern world.  When a large concern like the Royal Navy receives material from an engineering firm it is, I suppose, subjected to sufficiently careful inspection and testing to reduce the frequency of the acceptance of faulty or defective consignments. . . . I am casting no contempt on acceptance procedures, and I am thankful, whenever I travel by air, that the high level of precision and reliability required can really be achieved by such means.  But the logical differences between such an operation and the work of scientific discovery by physical or biological experimentation seem to me so wide that the analogy between them is not helpful . . . . [Advocates of behavioristic statistics are like]

Russians [who] are made familiar with the ideal that research in pure science can and should be geared to technological performance, in the comprehensive organized effort of a five-year plan for the nation. . . .

In the U.S. also the great importance of organized technology has I think made it easy to confuse the process appropriate for drawing correct conclusions, with those aimed rather at, let us say, speeding production, or saving money. (Fisher 1955, 69-70)

Fade to black.  The lights go up on Egon Pearson stage right (who looks like he does in my sketch [frontispiece] from EGEK 1996, a bit like a young C. S. Peirce):

Pearson: There was no sudden descent upon British soil of Russian ideas regarding the function of science in relation to technology and to five-year plans. . . . Indeed, to dispel the picture of the Russian technological bogey, I might recall how certain early ideas came into my head as I sat on a gate overlooking an experimental blackcurrant plot . . . . To the best of my ability I was searching for a way of expressing in mathematical terms what appeared to me to be the requirements of the scientist in applying statistical tests to his data.  (Pearson 1955, 204)

Fade to black. The spotlight returns to Barnard and Mayo, but brighter. It looks as if it’s gotten hotter.  Barnard wipes his brow with a white handkerchief.  Mayo drinks her Diet Coke.

Barnard (ever so slightly angry): You have made one blunder in your paper. Fisher would never have made that remark about Russia.

There is a tense silence.

Mayo: But—it was a quote.

End of Act 1.

Given this was pre-internet, we couldn’t go to the source then and there, so we agreed to search for the paper in the library. Well, you get the idea. Maybe I could call the piece “Stat on a Hot Tin Roof.”

If you go see it, don’t say I didn’t warn you.

I’ve gotten various new speculations over the years as to why he had this reaction to the mention of Russia (check discussions in earlier posts with this play). Feel free to share yours. Some new (to me) information on Barnard is in George Box’s recent autobiography.


[i] We had also discussed this many years later, in 1999.

 

Categories: Barnard, phil/history of stat, rejected post, Statistics | Tags: , , , , | 3 Comments

Uncle Sam wants YOU to help with scientific reproducibility!

You still have a few days to respond to the call of your country to solve problems of scientific reproducibility!

The following passages come from Retraction Watch, with my own recommendations at the end.

“White House takes notice of reproducibility in science, and wants your opinion”

ostpThe White House’s Office of Science and Technology Policy (OSTP) is taking a look at innovation and scientific research, and issues of reproducibility have made it onto its radar.

Here’s the description of the project from the Federal Register:

The Office of Science and Technology Policy and the National Economic Council request public comments to provide input into an upcoming update of the Strategy for American Innovation, which helps to guide the Administration’s efforts to promote lasting economic growth and competitiveness through policies that support transformative American innovation in products, processes, and services and spur new fundamental discoveries that in the long run lead to growing economic prosperity and rising living standards.

I wonder what Steven Pinker would say about some of the above verbiage?

And here’s what’s catching the eye of people interested in scientific reproducibility:

(11) Given recent evidence of the irreproducibility of a surprising number of published scientific findings, how can the Federal Government leverage its role as a significant funder of scientific research to most effectively address the problem?

The OSTP is the same office that, in 2013, took what Nature called “a long-awaited leap forward for open access” when it said “that publications from taxpayer-funded research should be made free to read after a year’s delay.That OSTP memo came after more than 65,000 people “signed a We the People petition asking for expanded public access to the results of taxpayer-funded research.”

Have ideas on improving reproducibility? Emails to innovationstrategy@ostp.gov are preferred, according to the notice, which also explains how to fax or mail comments. The deadline is September 23.

Off the top of my head, how about:

Promote the use of methodologies that:

  • control and assess the capabilities of methods to avoid mistaken inferences from data;
  • require demonstrated self-criticism all the way from the data collection, modelling and interpretation (statistical and substantive);
  • describe what is especially shaky or poorly probed thus far (and spell out how subsequent studies are most likely to locate those flaws)[i]

Institute penalties for QRPs and fraud?

Please offer your suggestions in the comments, or directly to Uncle Sam.

 [i]It may require a certain courage on the part of researchers, journalists, referees.

Categories: Announcement, reproducibility | 18 Comments

A crucial missing piece in the Pistorius trial? (2): my answer (Rejected Post)

images-1

Time for a break with a “Rejected Post”[i]

There’s one crucial point that Prosecutor Nell overlooked and failed to employ in the Oscar Pistorius trial–or so it appears. In fact I haven’t heard anyone mention it—so maybe it’s not as critical as I think it is. Before revealing (what I regard as) an important missing piece, I ask readers and legal beagles out there for their informal take.

Here are some items from the announced verdict (which do not directly give away the missing piece, but may be enough to deduce it). (A general article is here.)

Oscar Pistorius ‘not guilty’ of girlfriend’s murder, rules judge Thokozile Masipa

AP | September 11, 2014, 17.09 pm IST

Before the break, Judge Masipa ruled out “dolus eventualis”[ii], saying Mr Pistorius could not have foreseen he would kill the person behind the toilet door.

“How could the accused have reasonably foreseen the shot he fired would have killed the deceased? Clearly he did not subjectively foresee this, that he would have killed the person behind the door, let alone the deceased,” said Judge Masipa.

The judge said the defence argues it is highly improbable the accused could have made this up so quickly and consistently, even in his bail application, [really?]

….Evidence shows that at time he fired shots at toilet door, Mr Pistorius believed the deceased was in the bedroom, the judge says. This belief was communicated to a number of people shortly after the incident, she added.

The judge said there is “nothing in the evidence to suggest that Mr Pistorius’ belief was not genuinely entertained”. She cites reasons including the bathroom window being open, and the toilet door being shut.

… He… [said] he genuinely, though erroneously, believed that his life and that of the deceased was in danger,” the judge said….

The starting point is “whether accused had intention to kill person behind toilet door,” the judge said. Continue reading

Categories: rejected post | 8 Comments

“The Supernal Powers Withhold Their Hands And Let Me Alone” : C.S. Peirce

C. S. Peirce: 10 Sept, 1839-19 April, 1914

C. S. Peirce: 10 Sept, 1839-19 April, 1914

Memory Lane* in Honor of C.S. Peirce’s Birthday:
(Part 3) of “Peircean Induction and the Error-Correcting Thesis”

Deborah G. Mayo
Transactions of the Charles S. Peirce Society 41(2) 2005: 299-319

(9/10) Peircean Induction and the Error-Correcting Thesis (Part I)

(9/10) (Part 2) Peircean Induction and the Error-Correcting Thesis

8. Random sampling and the uniformity of nature

We are now at the point to address the final move in warranting Peirce’s [self-correcting thesis] SCT. The severity or trustworthiness assessment, on which the error correcting capacity depends, requires an appropriate link (qualitative or quantitative) between the data and the data generating phenomenon, e.g., a reliable calibration of a scale in a qualitative case, or a probabilistic connection between the data and the population in a quantitative case. Establishing such a link, however, is regarded as assuming observed regularities will persist, or making some “uniformity of nature” assumption—the bugbear of attempts to justify induction.

But Peirce contrasts his position with those favored by followers of Mill, and “almost all logicians” of his day, who “commonly teach that the inductive conclusion approximates to the truth because of the uniformity of nature” (2.775). Inductive inference, as Peirce conceives it (i.e., severe testing) does not use the uniformity of nature as a premise. Rather, the justification is sought in the manner of obtaining data. Justifying induction is a matter of showing that there exist methods with good error probabilities. For this it suffices that randomness be met only approximately, that inductive methods check their own assumptions, and that they can often detect and correct departures from randomness.

… It has been objected that the sampling cannot be random in this sense. But this is an idea which flies far away from the plain facts. Thirty throws of a die constitute an approximately random sample of all the throws of that die; and that the randomness should be approximate is all that is required. (1.94)

Peirce backs up his defense with robustness arguments. For example, in an (attempted) Binomial induction, Peirce asks, “what will be the effect upon inductive inference of an imperfection in the strictly random character of the sampling” (2.728). What if, for example, a certain proportion of the population had twice the probability of being selected? He shows that “an imperfection of that kind in the random character of the sampling will only weaken the inductive conclusion, and render the concluded ratio less determinate, but will not necessarily destroy the force of the argument completely” (2.728). This is particularly so if the sample mean is near 0 or 1. In other words, violating experimental assumptions may be shown to weaken the trustworthiness or severity of the proceeding, but this may only mean we learn a little less.

Yet a further safeguard is at hand:

Nor must we lose sight of the constant tendency of the inductive process to correct itself. This is of its essence. This is the marvel of it. …even though doubts may be entertained whether one selection of instances is a random one, yet a different selection, made by a different method, will be likely to vary from the normal in a different way, and if the ratios derived from such different selections are nearly equal, they may be presumed to be near the truth. (2.729)

Here, the marvel is an inductive method’s ability to correct the attempt at random sampling. Still, Peirce cautions, we should not depend so much on the self-correcting virtue that we relax our efforts to get a random and independent sample. But if our effort is not successful, and neither is our method robust, we will probably discover it. “This consideration makes it extremely advantageous in all ampliative reasoning to fortify one method of investigation by another” (ibid.). Continue reading

Categories: C.S. Peirce, Error Statistics, phil/history of stat | 11 Comments

Statistical Science: The Likelihood Principle issue is out…!

Stat SciAbbreviated Table of Contents:

Table of ContentsHere are some items for your Saturday-Sunday reading. 

Link to complete discussion: 

Mayo, Deborah G. On the Birnbaum Argument for the Strong Likelihood Principle (with discussion & rejoinder). Statistical Science 29 (2014), no. 2, 227-266.

Links to individual papers:

Mayo, Deborah G. On the Birnbaum Argument for the Strong Likelihood Principle. Statistical Science 29 (2014), no. 2, 227-239.

Dawid, A. P. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 240-241.

Evans, Michael. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 242-246.

Martin, Ryan; Liu, Chuanhai. Discussion: Foundations of Statistical Inference, Revisited. Statistical Science 29 (2014), no. 2, 247-251.

Fraser, D. A. S. Discussion: On Arguments Concerning Statistical Principles. Statistical Science 29 (2014), no. 2, 252-253.

Hannig, Jan. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 254-258.

Bjørnstad, Jan F. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 259-260.

Mayo, Deborah G. Rejoinder: “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 261-266.

Abstract: An essential component of inference based on familiar frequentist notions, such as p-values, significance and confidence levels, is the relevant sampling distribution. This feature results in violations of a principle known as the strong likelihood principle (SLP), the focus of this paper. In particular, if outcomes x and y from experiments E1 and E2 (both with unknown parameter θ), have different probability models f1( . ), f2( . ), then even though f1(xθ) = cf2(yθ) for all θ, outcomes x and ymay have different implications for an inference about θ. Although such violations stem from considering outcomes other than the one observed, we argue, this does not require us to consider experiments other than the one performed to produce the data. David Cox [Ann. Math. Statist. 29 (1958) 357–372] proposes the Weak Conditionality Principle (WCP) to justify restricting the space of relevant repetitions. The WCP says that once it is known which Ei produced the measurement, the assessment should be in terms of the properties of Ei. The surprising upshot of Allan Birnbaum’s [J.Amer.Statist.Assoc.57(1962) 269–306] argument is that the SLP appears to follow from applying the WCP in the case of mixtures, and so uncontroversial a principle as sufficiency (SP). But this would preclude the use of sampling distributions. The goal of this article is to provide a new clarification and critique of Birnbaum’s argument. Although his argument purports that [(WCP and SP), entails SLP], we show how data may violate the SLP while holding both the WCP and SP. Such cases also refute [WCP entails SLP].

Key words: Birnbaumization, likelihood principle (weak and strong), sampling theory, sufficiency, weak conditionality

Regular readers of this blog know that the topic of the “Strong Likelihood Principle (SLP)” has come up quite frequently. Numerous informal discussions of earlier attempts to clarify where Birnbaum’s argument for the SLP goes wrong may be found on this blog. [SEE PARTIAL LIST BELOW.[i]] These mostly stem from my initial paper Mayo (2010) [ii]. I’m grateful for the feedback.

In the months since this paper has been accepted for publication, I’ve been asked, from time to time, to reflect informally on the overall journey: (1) Why was/is the Birnbaum argument so convincing for so long? (Are there points being overlooked, even now?) (2) What would Birnbaum have thought? (3) What is the likely upshot for the future of statistical foundations (if any)?

I’ll try to share some responses over the next week. (Naturally, additional questions are welcome.)

[i] A quick take on the argument may be found in the appendix to: “A Statistical Scientist Meets a Philosopher of Science: A conversation between David Cox and Deborah Mayo (as recorded, June 2011)”

 UPhils and responses

 

 

Categories: Birnbaum, Birnbaum Brakes, frequentist/Bayesian, Likelihood Principle, phil/history of stat, Statistics | 40 Comments

All She Wrote (so far): Error Statistics Philosophy Contents-3 years on

 

old blogspot typewriter

.

Error Statistics Philosophy: Blog Contents
By: D. G. Mayo[i]

Each month, I will mark (in red) 3 relevant posts (from that month 3 yrs ago) for readers wanting to catch-up or review central themes and discussions.

September 2011

October 2011

November 2011

December 2011

January 2012

February 2012

March 2012

April 2012

May 2012

June 2012

July 2012

August 2012

September 2012

October 2012

November 2012

December 2012

January 2013

  • (1/2) Severity as a ‘Metastatistical’ Assessment
  • (1/4) Severity Calculator
  • (1/6) Guest post: Bad Pharma? (S. Senn)
  • (1/9) RCTs, skeptics, and evidence-based policy
  • (1/10) James M. Buchanan
  • (1/11) Aris Spanos: James M. Buchanan: a scholar, teacher and friend
  • (1/12) Error Statistics Blog: Table of Contents
  • (1/15) Ontology & Methodology: Second call for Abstracts, Papers
  • (1/18) New Kvetch/PhilStock
  • (1/19) Saturday Night Brainstorming and Task Forces: (2013) TFSI on NHST
  • (1/22) New PhilStock
  • (1/23) P-values as posterior odds?
  • (1/26) Coming up: December U-Phil Contributions….
  • (1/27) U-Phil: S. Fletcher & N.Jinn
  • (1/30) U-Phil: J. A. Miller: Blogging the SLP

February 2013

  • (2/2) U-Phil: Ton o’ Bricks
  • (2/4) January Palindrome Winner
  • (2/6) Mark Chang (now) gets it right about circularity
  • (2/8) From Gelman’s blog: philosophy and the practice of Bayesian statistics
  • (2/9) New kvetch: Filly Fury
  • (2/10) U-PHIL: Gandenberger & Hennig: Blogging Birnbaum’s Proof
  • (2/11) U-Phil: Mayo’s response to Hennig and Gandenberger
  • (2/13) Statistics as a Counter to Heavyweights…who wrote this?
  • (2/16) Fisher and Neyman after anger management?
  • (2/17) R. A. Fisher: how an outsider revolutionized statistics
  • (2/20) Fisher: from ‘Two New Properties of Mathematical Likelihood’
  • (2/23) Stephen Senn: Also Smith and Jones
  • (2/26) PhilStock: DO < $70
  • (2/26) Statistically speaking…

March 2013

  • (3/1) capitalizing on chance
  • (3/4) Big Data or Pig Data?
  • (3/7) Stephen Senn: Casting Stones
  • (3/10) Blog Contents 2013 (Jan & Feb)
  • (3/11) S. Stanley Young: Scientific Integrity and Transparency
  • (3/13) Risk-Based Security: Knives and Axes
  • (3/15) Normal Deviate: Double Misunderstandings About p-values
  • (3/17) Update on Higgs data analysis: statistical flukes (1)
  • (3/21) Telling the public why the Higgs particle matters
  • (3/23) Is NASA suspending public education and outreach?
  • (3/27) Higgs analysis and statistical flukes (part 2)
  • (3/31) possible progress on the comedy hour circuit?

April 2013

  • (4/1) Flawed Science and Stapel: Priming for a Backlash?
  • (4/4) Guest Post. Kent Staley: On the Five Sigma Standard in Particle Physics
  • (4/6) Who is allowed to cheat? I.J. Good and that after dinner comedy hour….
  • (4/10) Statistical flukes (3): triggering the switch to throw out 99.99% of the data
  • (4/11) O & M Conference (upcoming) and a bit more on triggering from a participant…..
  • (4/14) Does statistics have an ontology? Does it need one? (draft 2)
  • (4/19) Stephen Senn: When relevance is irrelevant
  • (4/22) Majority say no to inflight cell phone use, knives, toy bats, bow and arrows, according to survey
  • (4/23) PhilStock: Applectomy? (rejected post)
  • (4/25) Blog Contents 2013 (March)
  • (4/27) Getting Credit (or blame) for Something You Didn’t Do (BP oil spill, comedy hour)
  • (4/29) What should philosophers of science do? (falsification, Higgs, statistics, Marilyn)

May 2013

  • (5/3) Schedule for Ontology & Methodology, 2013
  • (5/6) Professorships in Scandal?
  • (5/9) If it’s called the “The High Quality Research Act,” then ….
  • (5/13) ‘No-Shame’ Psychics Keep Their Predictions Vague: New Rejected post
  • (5/14) “A sense of security regarding the future of statistical science…” Anon review of Error and Inference
  • (5/18) Gandenberger on Ontology and Methodology (May 4) Conference: Virginia Tech
  • (5/19) Mayo: Meanderings on the Onto-Methodology Conference
  • (5/22) Mayo’s slides from the Onto-Meth conference
  • (5/24) Gelman sides w/ Neyman over Fisher in relation to a famous blow-up
  • (5/26) Schachtman: High, Higher, Highest Quality Research Act
  • (5/27) A.Birnbaum: Statistical Methods in Scientific Inference
  • (5/29) K. Staley: review of Error & Inference

June 2013

  • (6/1) Winner of May Palindrome Contest
  • (6/1) Some statistical dirty laundry
  • (6/5) Do CIs Avoid Fallacies of Tests? Reforming the Reformers (Reblog 5/17/12):
  • (6/6) PhilStock: Topsy-Turvy Game
  • (6/6) Anything Tests Can do, CIs do Better; CIs Do Anything Better than Tests?* (reforming the reformers cont.)
  • (6/8) Richard Gill: “Integrity or fraud… or just questionable research practices?”
  • (6/11) Mayo: comment on the repressed memory research
  • (6/14) P-values can’t be trusted except when used to argue that p-values can’t be trusted!
  • (6/19) PhilStock: The Great Taper Caper
  • (6/19) Stanley Young: better p-values through randomization in microarrays
  • (6/22) What do these share in common: m&ms, limbo stick, ovulation, Dale Carnegie? Sat night potpourri
  • (6/26) Why I am not a “dualist” in the sense of Sander Greenland
  • (6/29) Palindrome “contest” contest
  • (6/30) Blog Contents: mid-year

July 2013

  • (7/3) Phil/Stat/Law: 50 Shades of gray between error and fraud
  • (7/6) Bad news bears: ‘Bayesian bear’ rejoinder–reblog mashup
  • (7/10) PhilStatLaw: Reference Manual on Scientific Evidence (3d ed) on Statistical Significance (Schachtman)
  • (7/11) Is Particle Physics Bad Science? (memory lane)
  • (7/13) Professor of Philosophy Resigns over Sexual Misconduct (rejected post)
  • (7/14) Stephen Senn: Indefinite irrelevance
  • (7/17) Phil/Stat/Law: What Bayesian prior should a jury have? (Schachtman)
  • (7/19) Msc Kvetch: A question on the Martin-Zimmerman case we do not hear
  • (7/20) Guest Post: Larry Laudan. Why Presuming Innocence is Not a Bayesian Prior
  • (7/23) Background Knowledge: Not to Quantify, But To Avoid Being Misled By, Subjective Beliefs
  • (7/26) New Version: On the Birnbaum argument for the SLP: Slides for JSM talk

August 2013

  • (8/1) Blogging (flogging?) the SLP: Response to Reply- Xi’an Robert
  • (8/5) At the JSM: 2013 International Year of Statistics
  • (8/6) What did Nate Silver just say? Blogging the JSM
  • (8/9) 11th bullet, multiple choice question, and last thoughts on the JSM
  • (8/11) E.S. Pearson: “Ideas came into my head as I sat on a gate overlooking an experimental blackcurrant plot”
  • (8/13) Blogging E.S. Pearson’s Statistical Philosophy
  • (8/15) A. Spanos: Egon Pearson’s Neglected Contributions to Statistics
  • (8/17) Gandenberger: How to Do Philosophy That Matters (guest post)
  • (8/21) Blog contents: July, 2013
  • (8/22) PhilStock: Flash Freeze
  • (8/22) A critical look at “critical thinking”: deduction and induction
  • (8/28) Is being lonely unnatural for slim particles? A statistical argument
  • (8/31) Overheard at the comedy hour at the Bayesian retreat-2 years on

September 2013

  • (9/2) Is Bayesian Inference a Religion?
  • (9/3) Gelman’s response to my comment on Jaynes
  • (9/5) Stephen Senn: Open Season (guest post)
  • (9/7) First blog: “Did you hear the one about the frequentist…”? and “Frequentists in Exile”
  • (9/10) Peircean Induction and the Error-Correcting Thesis (Part I)
  • (9/10) (Part 2) Peircean Induction and the Error-Correcting Thesis
  • (9/12) (Part 3) Peircean Induction and the Error-Correcting Thesis
  • (9/14) “When Bayesian Inference Shatters” Owhadi, Scovel, and Sullivan (guest post)
  • (9/18) PhilStock: Bad news is good news on Wall St.
  • (9/18) How to hire a fraudster chauffeur
  • (9/22) Statistical Theater of the Absurd: “Stat on a Hot Tin Roof”
  • (9/23) Barnard’s Birthday: background, likelihood principle, intentions
  • (9/24) Gelman est efffectivement une erreur statistician
  • (9/26) Blog Contents: August 2013
  • (9/29) Highly probable vs highly probed: Bayesian/ error statistical differences

October 2013

  • (10/3) Will the Real Junk Science Please Stand Up? (critical thinking)
  • (10/5) Was Janina Hosiasson pulling Harold Jeffreys’ leg?
  • (10/9) Bad statistics: crime or free speech (II)? Harkonen update: Phil Stat / Law /Stock
  • (10/12) Sir David Cox: a comment on the post, “Was Hosiasson pulling Jeffreys’ leg?”
  • (10/19) Blog Contents: September 2013
  • (10/19) Bayesian Confirmation Philosophy and the Tacking Paradox (iv)*
  • (10/25) Bayesian confirmation theory: example from last post…
  • (10/26) Comedy hour at the Bayesian (epistemology) retreat: highly probable vs highly probed (vs what ?)
  • (10/31) WHIPPING BOYS AND WITCH HUNTERS

November 2013

  • (11/2) Oxford Gaol: Statistical Bogeymen
  • (11/4) Forthcoming paper on the strong likelihood principle
  • (11/9) Null Effects and Replication
  • (11/9) Beware of questionable front page articles warning you to beware of questionable front page articles (iii)
  • (11/13) T. Kepler: “Trouble with ‘Trouble at the Lab’?” (guest post)
  • (11/16) PhilStock: No-pain bull
  • (11/16) S. Stanley Young: More Trouble with ‘Trouble in the Lab’ (Guest post)
  • (11/18) Lucien Le Cam: “The Bayesians hold the Magic”
  • (11/20) Erich Lehmann: Statistician and Poet
  • (11/23) Probability that it is a statistical fluke [i]
  • (11/27) “The probability that it be a statistical fluke” [iia]
  • (11/30) Saturday night comedy at the “Bayesian Boy” diary (rejected post*)

December 2013

  • (12/3) Stephen Senn: Dawid’s Selection Paradox (guest post)
  • (12/7) FDA’s New Pharmacovigilance
  • (12/9) Why ecologists might want to read more philosophy of science (UPDATED)
  • (12/11) Blog Contents for Oct and Nov 2013
  • (12/14) The error statistician has a complex, messy, subtle, ingenious piece-meal approach
  • (12/15) Surprising Facts about Surprising Facts
  • (12/19) A. Spanos lecture on “Frequentist Hypothesis Testing”
  • (12/24) U-Phil: Deconstructions [of J. Berger]: Irony & Bad Faith 3
  • (12/25) “Bad Arguments” (a book by Ali Almossawi)
  • (12/26) Mascots of Bayesneon statistics (rejected post)
  • (12/27) Deconstructing Larry Wasserman
  • (12/28) More on deconstructing Larry Wasserman (Aris Spanos)
  • (12/28) Wasserman on Wasserman: Update! December 28, 2013
  • (12/31) Midnight With Birnbaum (Happy New Year)

January 2014

  • (1/2) Winner of the December 2013 Palindrome Book Contest (Rejected Post)
  • (1/3) Error Statistics Philosophy: 2013
  • (1/4) Your 2014 wishing well. …
  • (1/7) “Philosophy of Statistical Inference and Modeling” New Course: Spring 2014: Mayo and Spanos: (Virginia Tech)
  • (1/11) Two Severities? (PhilSci and PhilStat)
  • (1/14) Statistical Science meets Philosophy of Science: blog beginnings
  • (1/16) Objective/subjective, dirty hands and all that: Gelman/Wasserman blogolog (ii)
  • (1/18) Sir Harold Jeffreys’ (tail area) one-liner: Sat night comedy [draft ii]
  • (1/22) Phil6334: “Philosophy of Statistical Inference and Modeling” New Course: Spring 2014: Mayo and Spanos (Virginia Tech) UPDATE: JAN 21
  • (1/24) Phil 6334: Slides from Day #1: Four Waves in Philosophy of Statistics
  • (1/25) U-Phil (Phil 6334) How should “prior information” enter in statistical inference?
  • (1/27) Winner of the January 2014 palindrome contest (rejected post)
  • (1/29) BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE: Revisiting the Foundations of Statistics
  • (1/31) Phil 6334: Day #2 Slides

February 2014

  • (2/1) Comedy hour at the Bayesian (epistemology) retreat: highly probable vs highly probed (vs B-boosts)
  • (2/3) PhilStock: Bad news is bad news on Wall St. (rejected post)
  • (2/5) “Probabilism as an Obstacle to Statistical Fraud-Busting” (draft iii)
  • (2/9) Phil6334: Day #3: Feb 6, 2014
  • (2/10) Is it true that all epistemic principles can only be defended circularly? A Popperian puzzle
  • (2/12) Phil6334: Popper self-test
  • (2/13) Phil 6334 Statistical Snow Sculpture
  • (2/14) January Blog Table of Contents
  • (2/15) Fisher and Neyman after anger management?
  • (2/17) R. A. Fisher: how an outsider revolutionized statistics
  • (2/18) Aris Spanos: The Enduring Legacy of R. A. Fisher
  • (2/20) R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’
  • (2/21) STEPHEN SENN: Fisher’s alternative to the alternative
  • (2/22) Sir Harold Jeffreys’ (tail-area) one-liner: Sat night comedy [draft ii]
  • (2/24) Phil6334: February 20, 2014 (Spanos): Day #5
  • (2/26) Winner of the February 2014 palindrome contest (rejected post)
  • (2/26) Phil6334: Feb 24, 2014: Induction, Popper and pseudoscience (Day #4)

March 2014

  • (3/1) Cosma Shalizi gets tenure (at last!) (metastat announcement)
  • (3/2) Significance tests and frequentist principles of evidence: Phil6334 Day #6
  • (3/3) Capitalizing on Chance (ii)
  • (3/4) Power, power everywhere–(it) may not be what you think! [illustration]
  • (3/8) Msc kvetch: You are fully dressed (even under you clothes)?
  • (3/8) Fallacy of Rejection and the Fallacy of Nouvelle Cuisine
  • (3/11) Phil6334 Day #7: Selection effects, the Higgs and 5 sigma, Power
  • (3/12) Get empowered to detect power howlers
  • (3/15) New SEV calculator (guest app: Durvasula)
  • (3/17) Stephen Senn: “Delta Force: To what extent is clinical relevance relevant?” (Guest Post)
  • (3/19) Power taboos: Statue of Liberty, Senn, Neyman, Carnap, Severity
  • (3/22) Fallacies of statistics & statistics journalism, and how to avoid them: Summary & Slides Day #8 (Phil 6334)
  • (3/25) The Unexpected Way Philosophy Majors Are Changing The World Of Business
  • (3/26) Phil6334:Misspecification Testing: Ordering From A Full Diagnostic Menu (part 1)
  • (3/28) Severe osteometric probing of skeletal remains: John Byrd
  • (3/29) Winner of the March 2014 palindrome contest (rejected post)
  • (3/30) Phil6334: March 26, philosophy of misspecification testing (Day #9 slides)

April 2014

  • (4/1) Skeptical and enthusiastic Bayesian priors for beliefs about insane asylum renovations at Dept of Homeland Security: I’m skeptical and unenthusiastic
  • (4/3) Self-referential blogpost (conditionally accepted*)
  • (4/5) Who is allowed to cheat? I.J. Good and that after dinner comedy hour. . ..
  • (4/6) Phil6334: Duhem’s Problem, highly probable vs highly probed; Day #9 Slides
  • (4/8) “Out Damned Pseudoscience: Non-significant results are the new ‘Significant’ results!” (update)
  • (4/12) “Murder or Coincidence?” Statistical Error in Court: Richard Gill (TEDx video)
  • (4/14) Phil6334: Notes on Bayesian Inference: Day #11 Slides
  • (4/16) A. Spanos: Jerzy Neyman and his Enduring Legacy
  • (4/17) Duality: Confidence intervals and the severity of tests
  • (4/19) Getting Credit (or blame) for Something You Didn’t Do (BP oil spill)
  • (4/21) Phil 6334: Foundations of statistics and its consequences: Day#12
  • (4/23) Phil 6334 Visitor: S. Stanley Young, “Statistics and Scientific Integrity”
  • (4/26) Reliability and Reproducibility: Fraudulent p-values through multiple testing (and other biases): S. Stanley Young (Phil 6334: Day #13)
  • (4/30) Able Stats Elba: 3 Palindrome nominees for April! (rejected post)

May 2014

  • (5/1) Putting the brakes on the breakthrough: An informal look at the argument for the Likelihood Principle
  • (5/3) You can only become coherent by ‘converting’ non-Bayesianly
  • (5/6) Winner of April Palindrome contest: Lori Wike
  • (5/7) A. Spanos: Talking back to the critics using error statistics (Phil6334)
  • (5/10) Who ya gonna call for statistical Fraudbusting? R.A. Fisher, P-values, and error statistics (again)
  • (5/15) Scientism and Statisticism: a conference* (i)
  • (5/17) Deconstructing Andrew Gelman: “A Bayesian wants everybody else to be a non-Bayesian.”
  • (5/20) The Science Wars & the Statistics Wars: More from the Scientism workshop
  • (5/25) Blog Table of Contents: March and April 2014
  • (5/27) Allan Birnbaum, Philosophical Error Statistician: 27 May 1923 – 1 July 1976
  • (5/31) What have we learned from the Anil Potti training and test data frameworks? Part 1 (draft 2)

June 2014

  • (6/5) Stephen Senn: Blood Simple? The complicated and controversial world of bioequivalence (guest post)
  • (6/9) “The medical press must become irrelevant to publication of clinical trials.”
  • (6/11) A. Spanos: “Recurring controversies about P values and confidence intervals revisited”
  • (6/14) “Statistical Science and Philosophy of Science: where should they meet?”
  • (6/21) Big Bayes Stories? (draft ii)
  • (6/25) Blog Contents: May 2014
  • (6/28) Sir David Hendry Gets Lifetime Achievement Award
  • (6/30) Some ironies in the ‘replication crisis’ in social psychology (4th and final installment)

July 2014

  • (7/7) Winner of June Palindrome Contest: Lori Wike
  • (7/8) Higgs Discovery 2 years on (1: “Is particle physics bad science?”)
  • (7/10) Higgs Discovery 2 years on (2: Higgs analysis and statistical flukes)
  • (7/14) “P-values overstate the evidence against the null”: legit or fallacious? (revised)
  • (7/23) Continued:”P-values overstate the evidence against the null”: legit or fallacious?
  • (7/26) S. Senn: “Responder despondency: myths of personalized medicine” (Guest Post)
  • (7/31) Roger Berger on Stephen Senn’s “Blood Simple” with a response by Senn (Guest Posts)

August 2014

  • (08/03) Blogging Boston JSM2014?
  • (08/05) Neyman, Power, and Severity
  • (08/06) What did Nate Silver just say? Blogging the JSM 2013
  • (08/09) Winner of July Palindrome: Manan Shah
  • (08/09) Blog Contents: June and July 2014
  • (08/11) Egon Pearson’s Heresy
  • (08/17) Are P Values Error Probabilities? Or, “It’s the methods, stupid!” (2nd install)
  • (08/23) Has Philosophical Superficiality Harmed Science?
  • (08/29) BREAKING THE LAW! (of likelihood): to keep their fit measures in line (A), (B 2nd)

[i]Table of Contents compiled by N. Jinn & J. Miller)*

*I thank Jean Miller for her assiduous work on the blog, and all contributors and readers for helping “frequentists in exile” to feel (and truly become) less exiled–wherever they may be!

Categories: blog contents, Metablog, Statistics | Leave a comment

3 in blog years: Sept 3 is 3rd anniversary of errorstatistics.com

Where did you hear this?  “Join me, if you will, for a little deep-water drilling, as I cast about on my isle of Elba.” Remember this and this? And this philosophical treatise on “moving blog day”? Oy, did I really write all this stuff?

http://errorstatistics.blogspot.com/2011/09/overheard-at-comedy-hour-at-bayesian_03.html

cake baked by blog staff for 3 year anniversary of errorstatistics.com

I still see this as my rag-tag amateur blog. I never learned html and don’t have time to now. But the blog enterprise was more jocund and easy-going then–just an experiment, really, and a place to discuss our RMM papers. (And, of course, a home for error statistical philosophers-in-exile).

A blog table of contents for all three years will appear tomorrow.

Anyway, 2 representatives from Elba flew into NYC and  baked this cake in my never-used Chef’s oven (based on the cover/table of contents of EGEK 1996). We’ll be celebrating at A Different Place tonight[i]–so if you’re in the neighborhood, stop by after 8pm for an Elba Grease (on me).

Do you want a free signed copy of EGEK? Say why in 25 words or less (to error@vt.edu), and the Fund for E.R.R.O.R.* will send them to the top 3 submissions (by 9/10/14).**

Acknowledgments: I want to thank the many commentators for their frequent insights and for keeping things interesting and lively. Among the regulars, and semi-regulars (but with impact) off the top of my head, and in no order: Senn, Yanofsky, Byrd, Gelman, Schachtman, Kepler, McKinney, S. Young, Matloff, O’Rourke, Gandenberger, Wasserman, E. Berk, Spanos, Glymour, Rohde, Greenland, Omaclaren,someone named Mark, assorted guests, original guests, and anons, and mysterious visitors, related twitterers (who would rather tweet from afar). I’m sure I’ve left some people out. Thanks to students and participants in the spring 2014 seminar with Aris Spanos (slides and lecture notes are still up).

I’m especially grateful to my regular guest bloggers: Stephen Senn and Aris Spanos, and to those who were subjected to deconstructions and to U-Phils in years past. (I may return to that some time.) Other guest posters for 2014 will be acknowledged in the year round up.

I thank blog compilers, Jean Miler and Nicole Jinn, and give special thanks for the tireless efforts of Jean Miller who has slogged through html, or whatever it is, when necessary, has scanned and put up dozens of articles to make them easy for readers to access, taken slow ferries back and forth to the island of Elba, and fixed gazillions of glitches on a daily basis. Last, but not least, to the palindromists who have been winning lots of books recently (1 day left for August submissions).

*Experimental Reasoning, Reliability, Objectivity and Rationality.

** Accompany submissions with an e-mail address and regular address. All submissions remain private. Elba judges decisions are final. Void in any places where prohibited by laws, be they laws of likelihood or Napoleanic laws-in-exile. But seriously, we’re giving away 3 books.

[i]email for directions.

Categories: Announcement, Statistics | 12 Comments

BREAKING THE LAW! (of likelihood): to keep their fit measures in line (A), (B 2nd)

.

.

1.An Assumed Law of Statistical Evidence (law of likelihood)

Nearly all critical discussions of frequentist error statistical inference (significance tests, confidence intervals, p- values, power, etc.) start with the following general assumption about the nature of inductive evidence or support:

Data x are better evidence for hypothesis H1 than for H0 if x are more probable under H1 than under H0.

Ian Hacking (1965) called this the logic of support: x supports hypotheses H1 more than H0 if H1 is more likely, given x than is H0:

Pr(x; H1) > Pr(x; H0).

[With likelihoods, the data x are fixed, the hypotheses vary.]*

Or,

x is evidence for H1 over H0 if the likelihood ratio LR (H1 over H0 ) is greater than 1.

It is given in other ways besides, but it’s the same general idea. (Some will take the LR as actually quantifying the support, others leave it qualitative.)

In terms of rejection:

“An hypothesis should be rejected if and only if there is some rival hypothesis much better supported [i.e., much more likely] than it is.” (Hacking 1965, 89)

2. Barnard (British Journal of Philosophy of Science )

But this “law” will immediately be seen to fail on our minimal severity requirement. Hunting for an impressive fit, or trying and trying again, it’s easy to find a rival hypothesis H1 much better “supported” than H0 even when H0 is true. Or, as Barnard (1972) puts it, “there always is such a rival hypothesis, viz. that things just had to turn out the way they actually did” (1972 p. 129).  H0: the coin is fair, gets a small likelihood (.5)k given k tosses of a coin, while H1: the probability of heads is 1 just on those tosses that yield a head, renders the sequence of k outcomes maximally likely. This is an example of Barnard’s “things just had to turn out as they did”. Or, to use an example with P-values: a statistically significant difference, being improbable under the null H0 , will afford high likelihood to any number of explanations that fit the data well.

3.Breaking the law (of likelihood) by going to the “second,” error statistical level:

How does it fail our severity requirement? First look at what the frequentist error statistician must always do to critique an inference: she must consider the capability of the inference method that purports to provide evidence for a claim. She goes to a higher level or metalevel, as it were. In this case, the likelihood ratio plays the role of the needed statistic d(X). To put it informally, she asks:

What’s the probability the method would yield an LR disfavoring H0 compared to some alternative H1  even if H0 is true?

Continue reading

Categories: highly probable vs highly probed, law of likelihood, Likelihood Principle, Statistics | 72 Comments

Has Philosophical Superficiality Harmed Science?

images

.

I have been asked what I thought of some criticisms of the scientific relevance of philosophy of science, as discussed in the following snippet from a recent Scientific American blog. My title elicits the appropriate degree of ambiguity, I think. 

Quantum Gravity Expert Says “Philosophical Superficiality” Has Harmed Physics

By John Horgan | August 21, 2014 |  14

“I interviewed Rovelli by phone in the early 1990s when I was writing a story for Scientific American about loop quantum gravity, a quantum-mechanical version of gravity proposed by Rovelli, Lee Smolin and Abhay Ashtekar[i]

Horgan: What’s your opinion of the recent philosophy-bashing by Stephen Hawking, Lawrence Krauss and Neil deGrasse Tyson?

Rovelli: Seriously: I think they are stupid in this.   I have admiration for them in other things, but here they have gone really wrong.  Look: Einstein, Heisenberg, Newton, Bohr…. and many many others of the greatest scientists of all times, much greater than the names you mention, of course, read philosophy, learned from philosophy, and could have never done the great science they did without the input they got from philosophy, as they claimed repeatedly. You see: the scientists that talk philosophy down are simply superficial: they have a philosophy (usually some ill-digested mixture of Popper and Kuhn) and think that this is the “true” philosophy, and do not realize that this has limitations.

Here is an example: theoretical physics has not done great in the last decades. Why? Well, one of the reasons, I think, is that it got trapped in a wrong philosophy: the idea that you can make progress by guessing new theory and disregarding the qualitative content of previous theories.  This is the physics of the “why not?”  Why not studying this theory, or the other? Why not another dimension, another field, another universe?  Science has never advanced in this manner in the past.  Science does not advance by guessing. It advances by new data or by a deep investigation of the content and the apparent contradictions of previous empirically successful theories.  Quite remarkably, the best piece of physics done by the three people you mention is Hawking’s black-hole radiation, which is exactly this.  But most of current theoretical physics is not of this sort.  Why?  Largely because of the philosophical superficiality of the current bunch of scientists.”

I find it intriguing that Rovelli suggests that “Science does not advance by guessing. It advances by new data or by a deep investigation of the content and the apparent contradictions of previous empirically successful theories.” I think this is an interesting and subtle claim with which I agree. Continue reading

Categories: StatSci meets PhilSci, strong likelihood principle | 33 Comments

Are P Values Error Probabilities? or, “It’s the methods, stupid!” (2nd install)

f1ce127a4cfe95c4f645f0cc98f04fca

.

Despite the fact that Fisherians and Neyman-Pearsonians alike regard observed significance levels, or P values, as error probabilities, we occasionally hear allegations (typically from those who are neither Fisherian nor N-P theorists) that P values are actually not error probabilities. The denials tend to go hand in hand with allegations that P values exaggerate evidence against a null hypothesis—a problem whose cure invariably invokes measures that are at odds with both Fisherian and N-P tests. The Berger and Sellke (1987) article from a recent post is a good example of this. When leading figures put forward a statement that looks to be straightforwardly statistical, others tend to simply repeat it without inquiring whether the allegation actually mixes in issues of interpretation and statistical philosophy. So I wanted to go back and look at their arguments. I will post this in installments.

1. Some assertions from Fisher, N-P, and Bayesian camps

Here are some assertions from Fisherian, Neyman-Pearsonian and Bayesian camps: (I make no attempt at uniformity in writing the “P-value”, but retain the quotes as written.)

a) From the Fisherian camp (Cox and Hinkley):

For given observations y we calculate t = tobs = t(y), say, and the level of significance pobs by

pobs = Pr(T > tobs; H0).

….Hence pobs is the probability that we would mistakenly declare there to be evidence against H0, were we to regard the data under analysis as being just decisive against H0.” (Cox and Hinkley 1974, 66).

Thus pobs would be the Type I error probability associated with the test.

b) From the Neyman-Pearson N-P camp (Lehmann and Romano):

“[I]t is good practice to determine not only whether the hypothesis is accepted or rejected at the given significance level, but also to determine the smallest significance level…at which the hypothesis would be rejected for the given observation. This number, the so-called p-value gives an idea of how strongly the data contradict the hypothesis. It also enables others to reach a verdict based on the significance level of their choice.” (Lehmann and Romano 2005, 63-4) 

Very similar quotations are easily found, and are regarded as uncontroversial—even by Bayesians whose contributions stood at the foot of Berger and Sellke’s argument that P values exaggerate the evidence against the null. Continue reading

Categories: frequentist/Bayesian, J. Berger, P-values, Statistics | 32 Comments

Egon Pearson’s Heresy

E.S. Pearson: 11 Aug 1895-12 June 1980.

Today is Egon Pearson’s birthday: 11 August 1895-12 June, 1980.
E. Pearson rejected some of the familiar tenets that have come to be associated with Neyman and Pearson (N-P) statistical tests, notably the idea that the essential justification for tests resides in a long-run control of rates of erroneous interpretations–what he termed the “behavioral” rationale of tests. In an unpublished letter E. Pearson wrote to Birnbaum (1974), he talks about N-P theory admitting of two interpretations: behavioral and evidential:

“I think you will pick up here and there in my own papers signs of evidentiality, and you can say now that we or I should have stated clearly the difference between the behavioral and evidential interpretations. Certainly we have suffered since in the way the people have concentrated (to an absurd extent often) on behavioral interpretations”.

(Nowadays, some people concentrate to an absurd extent on “science-wise error rates in dichotomous screening”.)

When Erich Lehmann, in his review of my “Error and the Growth of Experimental Knowledge” (EGEK 1996), called Pearson “the hero of Mayo’s story,” it was because I found in E.S.P.’s work, if only in brief discussions, hints, and examples, the key elements for an “inferential” or “evidential” interpretation of N-P statistics. Granted, these “evidential” attitudes and practices have never been explicitly codified to guide the interpretation of N-P tests. If they had been, I would not be on about providing an inferential philosophy all these years.[i] Nevertheless, “Pearson and Pearson” statistics (both Egon, not Karl) would have looked very different from Neyman and Pearson statistics, I suspect. One of the few sources of E.S. Pearson’s statistical philosophy is his (1955) “Statistical Concepts in Their Relation to Reality”. It begins like this: Continue reading

Categories: phil/history of stat, Philosophy of Statistics, Statistics | Tags: , | 2 Comments

Blog at WordPress.com. The Adventure Journal Theme.

Follow

Get every new post delivered to your Inbox.

Join 486 other followers