Should a “Fictionfactory” peepshow be barred from a festival on “Truth and Reality”? Diederik Stapel says no (rejected post)

photo-on-9-17-14-at-9-49-pm1So I hear that Diederik Stapel is the co-author of a book Fictionfactory (in Dutch,with a novelist, Dautzenberg)[i], and of what they call their “Fictionfactory peepshow”, only it’s been disinvited at the last minute from a Dutch festival on“truth and reality” (due to have run 9/26/14), and all because of Stapel’s involvement. Here’s an excerpt from an article in last week’s Retraction Watch (article is here):*

Here’s a case of art imitating science.

The organizers of a Dutch drama festival have put a halt to a play about the disgraced social psychologist Diederik Stapel, prompting protests from the authors of the skit — one of whom is Stapel himself.

According to an article in NRC Handelsblad:

The Amsterdam Discovery Festival on science and art has canceled at the last minute, the play written by Anton Dautzenberg and former professor Diederik Stapel. Co-sponsor, The Royal Netherlands Academy of Arts and Sciences (KNAW), doesn’t want Stapel, who committed science fraud, to perform at a festival that’s associated with the KNAW.


The management of the festival, planned for September 26th at the Tolhuistuin in Amsterdam, contacted Stapel and Dautzenberg 4 months ago with the request to organize a performance of their book and lecture project ‘The Fictionfactory”. Especially for this festival they [Stapel and Dautzenberg] created a ‘Fictionfactory-peepshow’.

“Last Friday I received a call [from the management of the festival] that our performance has been canceled at the last minute because the KNAW will withdraw their subsidy if Stapel is on the festival program”, says Dautzenberg. “This looks like censorship, and by an institution that also wants to represents arts and experiments”.

Well this is curious, as things with Stapel always are. What’s the “Fichtionfactory Peepshow”? If you go to Stapel’s homepage, it’s all in Dutch, but Google translation isn’t too bad, and I have a pretty good description of the basic idea. So since it’s Saturday night,let’s take a peek, or peep (at what it might have been)…


Here we are at the “Truth and Reality” Festival: first stop (after some cotton candy): the Stapel Fictionfactory Peepshow! It’s all dark, I can’t see a thing. What? It says I have to put some coins in a slot if I want to turn turn him it on (but that they also take credit cards). So I’m to look down in this tiny window. The curtains are opening!…I see a stage with two funky looking guys– one of them is Stapel. They’re reading, or reciting from some magazine with big letters: “Fact, Fiction and the Frictions we Hide”.

Stapel and Dautzenberg

Stapel and Dautzenberg

STAPEL: Welkom.You can ask us any questions! In response, you will always be given an option: ‘Do you want to know the truth or do you want to be comforted with fictions and feel-good fantasy?’

“Well I’ve brought some data with me from a study in social psychology. My question is this: “Is there a statistically significant effect here?”

STAPEL:Do you want to know the truth or do you want to be comforted with fictions and feel-good fantasy?

“Fiction please”.

STAPEL: I can massage your data, manipulate your numbers, reveal the taboos normally kept under wraps. For a few more coins I will let you see the secrets behind unreplicable results, and for a large bill will manufacture for you a sexy statistical story to turn on the editors.

(Then after the dirty business is all done [ii].)

STAPEL: Do you have more questions for me?

“Will it be published (fiction please)?”

STAPEL: “yes”

“will anyone find out about this (fiction please)?”

STAPEL: “No, I mean yes, I mean no.”


“I’d like to change to hearing the truth now. I have three questions”.

STAPEL: No problem, we take credit cards. Dank u. What are your questions?’

“Will Uri Simonsohn be able to fraudbust my results using the kind of tests he used on others? and if so, how long will it take him? (truth, please)?

STAPEL: “Yes.But not for at least 6 months to one year.”

“Here’s my final question. Are these data really statistically significant and at what level?” (truth please)

Nothing. Blank screen suddenly! With an acrid smelling puff of smoke, ew. But I’d already given the credit card! (Tricked by the master trickster).


What if he either always lies or always tells the truth? Then what would you ask him if you want to know the truth about your data? (Liar’s paradox variant)

Feel free to share your queries/comments.

* I thank Caitlin Parker for sending me the article

[i]Diederik Stapel was found guilty of science fraud in psychology in 2011, made up data out of whole cloth, retracted over 50 papers..



[ii] Perhaps they then ask you how much you’ll pay for a bar of soap (because you’d sullied yourself). Why let potential priming data go to waste?  Oh wait, he doesn’t use real data…. Perhaps the peepshow was supposed to be a kind of novel introduction to research ethics.


Some previous posts on Stapel:


Categories: Comedy, junk science, rejected post, Statistics | 5 Comments

G.A. Barnard: The Bayesian “catch-all” factor: probability vs likelihood


G. A. Barnard: 23 Sept 1915-30 July, 2002

Today is George Barnard’s birthday. In honor of this, I have typed in an exchange between Barnard, Savage (and others) on an important issue that we’d never gotten around to discussing explicitly (on likelihood vs probability). Please share your thoughts.

The exchange is from pp 79-84 (of what I call) “The Savage Forum” (Savage, 1962)[i]


BARNARD:…Professor Savage, as I understand him, said earlier that a difference between likelihoods and probabilities was that probabilities would normalize because they integrate to one, whereas likelihoods will not. Now probabilities integrate to one only if all possibilities are taken into account. This requires in its application to the probability of hypotheses that we should be in a position to enumerate all possible hypotheses which might explain a given set of data. Now I think it is just not true that we ever can enumerate all possible hypotheses. … If this is so we ought to allow that in addition to the hypotheses that we really consider we should allow something that we had not thought of yet, and of course as soon as we do this we lose the normalizing factor of the probability, and from that point of view probability has no advantage over likelihood. This is my general point, that I think while I agree with a lot of the technical points, I would prefer that this is talked about in terms of likelihood rather than probability. I should like to ask what Professor Savage thinks about that, whether he thinks that the necessity to enumerate hypotheses exhaustively, is important.

SAVAGE: Surely, as you say, we cannot always enumerate hypotheses so completely as we like to think. The list can, however, always be completed by tacking on a catch-all ‘something else’. In principle, a person will have probabilities given ‘something else’ just as he has probabilities given other hypotheses. In practice, the probability of a specified datum given ‘something else’ is likely to be particularly vague­–an unpleasant reality. The probability of ‘something else’ is also meaningful of course, and usually, though perhaps poorly defined, it is definitely very small. Looking at things this way, I do not find probabilities unnormalizable, certainly not altogether unnormalizable.

Whether probability has an advantage over likelihood seems to me like the question whether volts have an advantage over amperes. The meaninglessness of a norm for likelihood is for me a symptom of the great difference between likelihood and probability. Since you question that symptom, I shall mention one or two others. …

On the more general aspect of the enumeration of all possible hypotheses, I certainly agree that the danger of losing serendipity by binding oneself to an over-rigid model is one against which we cannot be too alert. We must not pretend to have enumerated all the hypotheses in some simple and artificial enumeration that actually excludes some of them. The list can however be completed, as I have said, by adding a general ‘something else’ hypothesis, and this will be quite workable, provided you can tell yourself in good faith that ‘something else’ is rather improbable. The ‘something else’ hypothesis does not seem to make it any more meaningful to use likelihood for probability than to use volts for amperes.

Let us consider an example. Off hand, one might think it quite an acceptable scientific question to ask, ‘What is the melting point of californium?’ Such a question is, in effect, a list of alternatives that pretends to be exhaustive. But, even specifying which isotope of californium is referred to and the pressure at which the melting point is wanted, there are alternatives that the question tends to hide. It is possible that californium sublimates without melting or that it behaves like glass. Who dare say what other alternatives might obtain? An attempt to measure the melting point of californium might, if we are serendipitous, lead to more or less evidence that the concept of melting point is not directly applicable to it. Whether this happens or not, Bayes’s theorem will yield a posterior probability distribution for the melting point given that there really is one, based on the corresponding prior conditional probability and on the likelihood of the observed reading of the thermometer as a function of each possible melting point. Neither the prior probability that there is no melting point, nor the likelihood for the observed reading as a function of hypotheses alternative to that of the existence of a melting point enter the calculation. The distinction between likelihood and probability seems clear in this problem, as in any other.

BARNARD: Professor Savage says in effect, ‘add at the bottom of list H1, H2,…”something else”’. But what is the probability that a penny comes up heads given the hypothesis ‘something else’. We do not know. What one requires for this purpose is not just that there should be some hypotheses, but that they should enable you to compute probabilities for the data, and that requires very well defined hypotheses. For the purpose of applications, I do not think it is enough to consider only the conditional posterior distributions mentioned by Professor Savage. Continue reading

Categories: Barnard, highly probable vs highly probed, phil/history of stat, Statistics | 26 Comments

Statistical Theater of the Absurd: “Stat on a Hot Tin Roof”

metablog old fashion typewriterMemory lane: Did you ever consider how some of the colorful exchanges among better-known names in statistical foundations could be the basis for high literary drama in the form of one-act plays (even if appreciated by only 3-7 people in the world)? (Think of the expressionist exchange between Bohr and Heisenberg in Michael Frayn’s play Copenhagen, except here there would be no attempt at all to popularize—only published quotes and closely remembered conversations would be included, with no attempt to create a “story line”.)  Somehow I didn’t think so. But rereading some of Savage’s high-flown praise of Birnbaum’s “breakthrough” argument (for the Likelihood Principle) today, I was swept into a “(statistical) theater of the absurd” mindset.

The first one came to me in autumn 2008 while I was giving a series of seminars on philosophy of statistics at the LSE. Modeled on a disappointing (to me) performance of The Woman in Black, “A Funny Thing Happened at the [1959] Savage Forum” relates Savage’s horror at George Barnard’s announcement of having rejected the Likelihood Principle!

The current piece also features George Barnard and since Monday (9/23) is Barnard’s birthday, I’m digging it out of “rejected posts” to reblog it. It recalls our first meeting in London in 1986. I’d sent him a draft of my paper “Why Pearson Rejected the Neyman-Pearson Theory of Statistics” (later adapted as chapter 11 of EGEK) to see whether I’d gotten Pearson right. He’d traveled quite a ways, from Colchester, I think. It was June and hot, and we were up on some kind of a semi-enclosed rooftop. Barnard was sitting across from me looking rather bemused.Barnard-1979-picture

The curtain opens with Barnard and Mayo on the roof, lit by a spot mid-stage. He’s drinking (hot) tea; she, a Diet Coke. The dialogue (is what I recall from the time[i]):

 Barnard: I read your paper. I think it is quite good.  Did you know that it was I who told Fisher that Neyman-Pearson statistics had turned his significance tests into little more than acceptance procedures?

Mayo:  Thank you so much for reading my paper.  I recall a reference to you in Pearson’s response to Fisher, but I didn’t know the full extent.

Barnard: I was the one who told Fisher that Neyman was largely to blame. He shouldn’t be too hard on Egon.  His statistical philosophy, you are aware, was different from Neyman’s.

Mayo:  That’s interesting.  I did quote Pearson, at the end of his response to Fisher, as saying that inductive behavior was “Neyman’s field, not mine”.  I didn’t know your role in his laying the blame on Neyman!

Fade to black. The lights go up on Fisher, stage left, flashing back some 30 years earlier . . . ….

Fisher: Now, acceptance procedures are of great importance in the modern world.  When a large concern like the Royal Navy receives material from an engineering firm it is, I suppose, subjected to sufficiently careful inspection and testing to reduce the frequency of the acceptance of faulty or defective consignments. . . . I am casting no contempt on acceptance procedures, and I am thankful, whenever I travel by air, that the high level of precision and reliability required can really be achieved by such means.  But the logical differences between such an operation and the work of scientific discovery by physical or biological experimentation seem to me so wide that the analogy between them is not helpful . . . . [Advocates of behavioristic statistics are like]

Russians [who] are made familiar with the ideal that research in pure science can and should be geared to technological performance, in the comprehensive organized effort of a five-year plan for the nation. . . .

In the U.S. also the great importance of organized technology has I think made it easy to confuse the process appropriate for drawing correct conclusions, with those aimed rather at, let us say, speeding production, or saving money. (Fisher 1955, 69-70)

Fade to black.  The lights go up on Egon Pearson stage right (who looks like he does in my sketch [frontispiece] from EGEK 1996, a bit like a young C. S. Peirce):

Pearson: There was no sudden descent upon British soil of Russian ideas regarding the function of science in relation to technology and to five-year plans. . . . Indeed, to dispel the picture of the Russian technological bogey, I might recall how certain early ideas came into my head as I sat on a gate overlooking an experimental blackcurrant plot . . . . To the best of my ability I was searching for a way of expressing in mathematical terms what appeared to me to be the requirements of the scientist in applying statistical tests to his data.  (Pearson 1955, 204)

Fade to black. The spotlight returns to Barnard and Mayo, but brighter. It looks as if it’s gotten hotter.  Barnard wipes his brow with a white handkerchief.  Mayo drinks her Diet Coke.

Barnard (ever so slightly angry): You have made one blunder in your paper. Fisher would never have made that remark about Russia.

There is a tense silence.

Mayo: But—it was a quote.

End of Act 1.

Given this was pre-internet, we couldn’t go to the source then and there, so we agreed to search for the paper in the library. Well, you get the idea. Maybe I could call the piece “Stat on a Hot Tin Roof.”

If you go see it, don’t say I didn’t warn you.

I’ve gotten various new speculations over the years as to why he had this reaction to the mention of Russia (check discussions in earlier posts with this play). Feel free to share yours. Some new (to me) information on Barnard is in George Box’s recent autobiography.

[i] We had also discussed this many years later, in 1999.


Categories: Statistics, phil/history of stat, rejected post, Barnard | Tags: , , , , | 3 Comments

Uncle Sam wants YOU to help with scientific reproducibility!

You still have a few days to respond to the call of your country to solve problems of scientific reproducibility!

The following passages come from Retraction Watch, with my own recommendations at the end.

“White House takes notice of reproducibility in science, and wants your opinion”

ostpThe White House’s Office of Science and Technology Policy (OSTP) is taking a look at innovation and scientific research, and issues of reproducibility have made it onto its radar.

Here’s the description of the project from the Federal Register:

The Office of Science and Technology Policy and the National Economic Council request public comments to provide input into an upcoming update of the Strategy for American Innovation, which helps to guide the Administration’s efforts to promote lasting economic growth and competitiveness through policies that support transformative American innovation in products, processes, and services and spur new fundamental discoveries that in the long run lead to growing economic prosperity and rising living standards.

I wonder what Steven Pinker would say about some of the above verbiage?

And here’s what’s catching the eye of people interested in scientific reproducibility:

(11) Given recent evidence of the irreproducibility of a surprising number of published scientific findings, how can the Federal Government leverage its role as a significant funder of scientific research to most effectively address the problem?

The OSTP is the same office that, in 2013, took what Nature called “a long-awaited leap forward for open access” when it said “that publications from taxpayer-funded research should be made free to read after a year’s delay.That OSTP memo came after more than 65,000 people “signed a We the People petition asking for expanded public access to the results of taxpayer-funded research.”

Have ideas on improving reproducibility? Emails to are preferred, according to the notice, which also explains how to fax or mail comments. The deadline is September 23.

Off the top of my head, how about:

Promote the use of methodologies that:

  • control and assess the capabilities of methods to avoid mistaken inferences from data;
  • require demonstrated self-criticism all the way from the data collection, modelling and interpretation (statistical and substantive);
  • describe what is especially shaky or poorly probed thus far (and spell out how subsequent studies are most likely to locate those flaws)[i]

Institute penalties for QRPs and fraud?

Please offer your suggestions in the comments, or directly to Uncle Sam.

 [i]It may require a certain courage on the part of researchers, journalists, referees.

Categories: Announcement, reproducibility | 18 Comments

A crucial missing piece in the Pistorius trial? (2): my answer (Rejected Post)


Time for a break with a “Rejected Post”[i]

There’s one crucial point that Prosecutor Nell overlooked and failed to employ in the Oscar Pistorius trial–or so it appears. In fact I haven’t heard anyone mention it—so maybe it’s not as critical as I think it is. Before revealing (what I regard as) an important missing piece, I ask readers and legal beagles out there for their informal take.

Here are some items from the announced verdict (which do not directly give away the missing piece, but may be enough to deduce it). (A general article is here.)

Oscar Pistorius ‘not guilty’ of girlfriend’s murder, rules judge Thokozile Masipa

AP | September 11, 2014, 17.09 pm IST

Before the break, Judge Masipa ruled out “dolus eventualis”[ii], saying Mr Pistorius could not have foreseen he would kill the person behind the toilet door.

“How could the accused have reasonably foreseen the shot he fired would have killed the deceased? Clearly he did not subjectively foresee this, that he would have killed the person behind the door, let alone the deceased,” said Judge Masipa.

The judge said the defence argues it is highly improbable the accused could have made this up so quickly and consistently, even in his bail application, [really?]

….Evidence shows that at time he fired shots at toilet door, Mr Pistorius believed the deceased was in the bedroom, the judge says. This belief was communicated to a number of people shortly after the incident, she added.

The judge said there is “nothing in the evidence to suggest that Mr Pistorius’ belief was not genuinely entertained”. She cites reasons including the bathroom window being open, and the toilet door being shut.

… He… [said] he genuinely, though erroneously, believed that his life and that of the deceased was in danger,” the judge said….

The starting point is “whether accused had intention to kill person behind toilet door,” the judge said. Continue reading

Categories: rejected post | 8 Comments

“The Supernal Powers Withhold Their Hands And Let Me Alone” : C.S. Peirce

C. S. Peirce: 10 Sept, 1839-19 April, 1914

C. S. Peirce: 10 Sept, 1839-19 April, 1914

Memory Lane* in Honor of C.S. Peirce’s Birthday:
(Part 3) of “Peircean Induction and the Error-Correcting Thesis”

Deborah G. Mayo
Transactions of the Charles S. Peirce Society 41(2) 2005: 299-319

(9/10) Peircean Induction and the Error-Correcting Thesis (Part I)

(9/10) (Part 2) Peircean Induction and the Error-Correcting Thesis

8. Random sampling and the uniformity of nature

We are now at the point to address the final move in warranting Peirce’s [self-correcting thesis] SCT. The severity or trustworthiness assessment, on which the error correcting capacity depends, requires an appropriate link (qualitative or quantitative) between the data and the data generating phenomenon, e.g., a reliable calibration of a scale in a qualitative case, or a probabilistic connection between the data and the population in a quantitative case. Establishing such a link, however, is regarded as assuming observed regularities will persist, or making some “uniformity of nature” assumption—the bugbear of attempts to justify induction.

But Peirce contrasts his position with those favored by followers of Mill, and “almost all logicians” of his day, who “commonly teach that the inductive conclusion approximates to the truth because of the uniformity of nature” (2.775). Inductive inference, as Peirce conceives it (i.e., severe testing) does not use the uniformity of nature as a premise. Rather, the justification is sought in the manner of obtaining data. Justifying induction is a matter of showing that there exist methods with good error probabilities. For this it suffices that randomness be met only approximately, that inductive methods check their own assumptions, and that they can often detect and correct departures from randomness.

… It has been objected that the sampling cannot be random in this sense. But this is an idea which flies far away from the plain facts. Thirty throws of a die constitute an approximately random sample of all the throws of that die; and that the randomness should be approximate is all that is required. (1.94)

Peirce backs up his defense with robustness arguments. For example, in an (attempted) Binomial induction, Peirce asks, “what will be the effect upon inductive inference of an imperfection in the strictly random character of the sampling” (2.728). What if, for example, a certain proportion of the population had twice the probability of being selected? He shows that “an imperfection of that kind in the random character of the sampling will only weaken the inductive conclusion, and render the concluded ratio less determinate, but will not necessarily destroy the force of the argument completely” (2.728). This is particularly so if the sample mean is near 0 or 1. In other words, violating experimental assumptions may be shown to weaken the trustworthiness or severity of the proceeding, but this may only mean we learn a little less.

Yet a further safeguard is at hand:

Nor must we lose sight of the constant tendency of the inductive process to correct itself. This is of its essence. This is the marvel of it. …even though doubts may be entertained whether one selection of instances is a random one, yet a different selection, made by a different method, will be likely to vary from the normal in a different way, and if the ratios derived from such different selections are nearly equal, they may be presumed to be near the truth. (2.729)

Here, the marvel is an inductive method’s ability to correct the attempt at random sampling. Still, Peirce cautions, we should not depend so much on the self-correcting virtue that we relax our efforts to get a random and independent sample. But if our effort is not successful, and neither is our method robust, we will probably discover it. “This consideration makes it extremely advantageous in all ampliative reasoning to fortify one method of investigation by another” (ibid.). Continue reading

Categories: C.S. Peirce, Error Statistics, phil/history of stat | 11 Comments

Statistical Science: The Likelihood Principle issue is out…!

Stat SciAbbreviated Table of Contents:

Table of ContentsHere are some items for your Saturday-Sunday reading. 

Link to complete discussion: 

Mayo, Deborah G. On the Birnbaum Argument for the Strong Likelihood Principle (with discussion & rejoinder). Statistical Science 29 (2014), no. 2, 227-266.

Links to individual papers:

Mayo, Deborah G. On the Birnbaum Argument for the Strong Likelihood Principle. Statistical Science 29 (2014), no. 2, 227-239.

Dawid, A. P. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 240-241.

Evans, Michael. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 242-246.

Martin, Ryan; Liu, Chuanhai. Discussion: Foundations of Statistical Inference, Revisited. Statistical Science 29 (2014), no. 2, 247-251.

Fraser, D. A. S. Discussion: On Arguments Concerning Statistical Principles. Statistical Science 29 (2014), no. 2, 252-253.

Hannig, Jan. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 254-258.

Bjørnstad, Jan F. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 259-260.

Mayo, Deborah G. Rejoinder: “On the Birnbaum Argument for the Strong Likelihood Principle”. Statistical Science 29 (2014), no. 2, 261-266.

Abstract: An essential component of inference based on familiar frequentist notions, such as p-values, significance and confidence levels, is the relevant sampling distribution. This feature results in violations of a principle known as the strong likelihood principle (SLP), the focus of this paper. In particular, if outcomes x and y from experiments E1 and E2 (both with unknown parameter θ), have different probability models f1( . ), f2( . ), then even though f1(xθ) = cf2(yθ) for all θ, outcomes x and ymay have different implications for an inference about θ. Although such violations stem from considering outcomes other than the one observed, we argue, this does not require us to consider experiments other than the one performed to produce the data. David Cox [Ann. Math. Statist. 29 (1958) 357–372] proposes the Weak Conditionality Principle (WCP) to justify restricting the space of relevant repetitions. The WCP says that once it is known which Ei produced the measurement, the assessment should be in terms of the properties of Ei. The surprising upshot of Allan Birnbaum’s [J.Amer.Statist.Assoc.57(1962) 269–306] argument is that the SLP appears to follow from applying the WCP in the case of mixtures, and so uncontroversial a principle as sufficiency (SP). But this would preclude the use of sampling distributions. The goal of this article is to provide a new clarification and critique of Birnbaum’s argument. Although his argument purports that [(WCP and SP), entails SLP], we show how data may violate the SLP while holding both the WCP and SP. Such cases also refute [WCP entails SLP].

Key words: Birnbaumization, likelihood principle (weak and strong), sampling theory, sufficiency, weak conditionality

Regular readers of this blog know that the topic of the “Strong Likelihood Principle (SLP)” has come up quite frequently. Numerous informal discussions of earlier attempts to clarify where Birnbaum’s argument for the SLP goes wrong may be found on this blog. [SEE PARTIAL LIST BELOW.[i]] These mostly stem from my initial paper Mayo (2010) [ii]. I’m grateful for the feedback.

In the months since this paper has been accepted for publication, I’ve been asked, from time to time, to reflect informally on the overall journey: (1) Why was/is the Birnbaum argument so convincing for so long? (Are there points being overlooked, even now?) (2) What would Birnbaum have thought? (3) What is the likely upshot for the future of statistical foundations (if any)?

I’ll try to share some responses over the next week. (Naturally, additional questions are welcome.)

[i] A quick take on the argument may be found in the appendix to: “A Statistical Scientist Meets a Philosopher of Science: A conversation between David Cox and Deborah Mayo (as recorded, June 2011)”

 UPhils and responses



Categories: Birnbaum, Birnbaum Brakes, frequentist/Bayesian, Likelihood Principle, phil/history of stat, Statistics | 40 Comments

All She Wrote (so far): Error Statistics Philosophy Contents-3 years on


old blogspot typewriter


Error Statistics Philosophy: Blog Contents
By: D. G. Mayo[i]

Each month, I will mark (in red) 3 relevant posts (from that month 3 yrs ago) for readers wanting to catch-up or review central themes and discussions.

September 2011

October 2011

November 2011

December 2011

January 2012

February 2012

March 2012

April 2012

May 2012

June 2012

July 2012

August 2012

September 2012

October 2012

November 2012

December 2012

January 2013

  • (1/2) Severity as a ‘Metastatistical’ Assessment
  • (1/4) Severity Calculator
  • (1/6) Guest post: Bad Pharma? (S. Senn)
  • (1/9) RCTs, skeptics, and evidence-based policy
  • (1/10) James M. Buchanan
  • (1/11) Aris Spanos: James M. Buchanan: a scholar, teacher and friend
  • (1/12) Error Statistics Blog: Table of Contents
  • (1/15) Ontology & Methodology: Second call for Abstracts, Papers
  • (1/18) New Kvetch/PhilStock
  • (1/19) Saturday Night Brainstorming and Task Forces: (2013) TFSI on NHST
  • (1/22) New PhilStock
  • (1/23) P-values as posterior odds?
  • (1/26) Coming up: December U-Phil Contributions….
  • (1/27) U-Phil: S. Fletcher & N.Jinn
  • (1/30) U-Phil: J. A. Miller: Blogging the SLP

February 2013

  • (2/2) U-Phil: Ton o’ Bricks
  • (2/4) January Palindrome Winner
  • (2/6) Mark Chang (now) gets it right about circularity
  • (2/8) From Gelman’s blog: philosophy and the practice of Bayesian statistics
  • (2/9) New kvetch: Filly Fury
  • (2/10) U-PHIL: Gandenberger & Hennig: Blogging Birnbaum’s Proof
  • (2/11) U-Phil: Mayo’s response to Hennig and Gandenberger
  • (2/13) Statistics as a Counter to Heavyweights…who wrote this?
  • (2/16) Fisher and Neyman after anger management?
  • (2/17) R. A. Fisher: how an outsider revolutionized statistics
  • (2/20) Fisher: from ‘Two New Properties of Mathematical Likelihood’
  • (2/23) Stephen Senn: Also Smith and Jones
  • (2/26) PhilStock: DO < $70
  • (2/26) Statistically speaking…

March 2013

  • (3/1) capitalizing on chance
  • (3/4) Big Data or Pig Data?
  • (3/7) Stephen Senn: Casting Stones
  • (3/10) Blog Contents 2013 (Jan & Feb)
  • (3/11) S. Stanley Young: Scientific Integrity and Transparency
  • (3/13) Risk-Based Security: Knives and Axes
  • (3/15) Normal Deviate: Double Misunderstandings About p-values
  • (3/17) Update on Higgs data analysis: statistical flukes (1)
  • (3/21) Telling the public why the Higgs particle matters
  • (3/23) Is NASA suspending public education and outreach?
  • (3/27) Higgs analysis and statistical flukes (part 2)
  • (3/31) possible progress on the comedy hour circuit?

April 2013

  • (4/1) Flawed Science and Stapel: Priming for a Backlash?
  • (4/4) Guest Post. Kent Staley: On the Five Sigma Standard in Particle Physics
  • (4/6) Who is allowed to cheat? I.J. Good and that after dinner comedy hour….
  • (4/10) Statistical flukes (3): triggering the switch to throw out 99.99% of the data
  • (4/11) O & M Conference (upcoming) and a bit more on triggering from a participant…..
  • (4/14) Does statistics have an ontology? Does it need one? (draft 2)
  • (4/19) Stephen Senn: When relevance is irrelevant
  • (4/22) Majority say no to inflight cell phone use, knives, toy bats, bow and arrows, according to survey
  • (4/23) PhilStock: Applectomy? (rejected post)
  • (4/25) Blog Contents 2013 (March)
  • (4/27) Getting Credit (or blame) for Something You Didn’t Do (BP oil spill, comedy hour)
  • (4/29) What should philosophers of science do? (falsification, Higgs, statistics, Marilyn)

May 2013

  • (5/3) Schedule for Ontology & Methodology, 2013
  • (5/6) Professorships in Scandal?
  • (5/9) If it’s called the “The High Quality Research Act,” then ….
  • (5/13) ‘No-Shame’ Psychics Keep Their Predictions Vague: New Rejected post
  • (5/14) “A sense of security regarding the future of statistical science…” Anon review of Error and Inference
  • (5/18) Gandenberger on Ontology and Methodology (May 4) Conference: Virginia Tech
  • (5/19) Mayo: Meanderings on the Onto-Methodology Conference
  • (5/22) Mayo’s slides from the Onto-Meth conference
  • (5/24) Gelman sides w/ Neyman over Fisher in relation to a famous blow-up
  • (5/26) Schachtman: High, Higher, Highest Quality Research Act
  • (5/27) A.Birnbaum: Statistical Methods in Scientific Inference
  • (5/29) K. Staley: review of Error & Inference

June 2013

  • (6/1) Winner of May Palindrome Contest
  • (6/1) Some statistical dirty laundry
  • (6/5) Do CIs Avoid Fallacies of Tests? Reforming the Reformers (Reblog 5/17/12):
  • (6/6) PhilStock: Topsy-Turvy Game
  • (6/6) Anything Tests Can do, CIs do Better; CIs Do Anything Better than Tests?* (reforming the reformers cont.)
  • (6/8) Richard Gill: “Integrity or fraud… or just questionable research practices?”
  • (6/11) Mayo: comment on the repressed memory research
  • (6/14) P-values can’t be trusted except when used to argue that p-values can’t be trusted!
  • (6/19) PhilStock: The Great Taper Caper
  • (6/19) Stanley Young: better p-values through randomization in microarrays
  • (6/22) What do these share in common: m&ms, limbo stick, ovulation, Dale Carnegie? Sat night potpourri
  • (6/26) Why I am not a “dualist” in the sense of Sander Greenland
  • (6/29) Palindrome “contest” contest
  • (6/30) Blog Contents: mid-year

July 2013

  • (7/3) Phil/Stat/Law: 50 Shades of gray between error and fraud
  • (7/6) Bad news bears: ‘Bayesian bear’ rejoinder–reblog mashup
  • (7/10) PhilStatLaw: Reference Manual on Scientific Evidence (3d ed) on Statistical Significance (Schachtman)
  • (7/11) Is Particle Physics Bad Science? (memory lane)
  • (7/13) Professor of Philosophy Resigns over Sexual Misconduct (rejected post)
  • (7/14) Stephen Senn: Indefinite irrelevance
  • (7/17) Phil/Stat/Law: What Bayesian prior should a jury have? (Schachtman)
  • (7/19) Msc Kvetch: A question on the Martin-Zimmerman case we do not hear
  • (7/20) Guest Post: Larry Laudan. Why Presuming Innocence is Not a Bayesian Prior
  • (7/23) Background Knowledge: Not to Quantify, But To Avoid Being Misled By, Subjective Beliefs
  • (7/26) New Version: On the Birnbaum argument for the SLP: Slides for JSM talk

August 2013

  • (8/1) Blogging (flogging?) the SLP: Response to Reply- Xi’an Robert
  • (8/5) At the JSM: 2013 International Year of Statistics
  • (8/6) What did Nate Silver just say? Blogging the JSM
  • (8/9) 11th bullet, multiple choice question, and last thoughts on the JSM
  • (8/11) E.S. Pearson: “Ideas came into my head as I sat on a gate overlooking an experimental blackcurrant plot”
  • (8/13) Blogging E.S. Pearson’s Statistical Philosophy
  • (8/15) A. Spanos: Egon Pearson’s Neglected Contributions to Statistics
  • (8/17) Gandenberger: How to Do Philosophy That Matters (guest post)
  • (8/21) Blog contents: July, 2013
  • (8/22) PhilStock: Flash Freeze
  • (8/22) A critical look at “critical thinking”: deduction and induction
  • (8/28) Is being lonely unnatural for slim particles? A statistical argument
  • (8/31) Overheard at the comedy hour at the Bayesian retreat-2 years on

September 2013

  • (9/2) Is Bayesian Inference a Religion?
  • (9/3) Gelman’s response to my comment on Jaynes
  • (9/5) Stephen Senn: Open Season (guest post)
  • (9/7) First blog: “Did you hear the one about the frequentist…”? and “Frequentists in Exile”
  • (9/10) Peircean Induction and the Error-Correcting Thesis (Part I)
  • (9/10) (Part 2) Peircean Induction and the Error-Correcting Thesis
  • (9/12) (Part 3) Peircean Induction and the Error-Correcting Thesis
  • (9/14) “When Bayesian Inference Shatters” Owhadi, Scovel, and Sullivan (guest post)
  • (9/18) PhilStock: Bad news is good news on Wall St.
  • (9/18) How to hire a fraudster chauffeur
  • (9/22) Statistical Theater of the Absurd: “Stat on a Hot Tin Roof”
  • (9/23) Barnard’s Birthday: background, likelihood principle, intentions
  • (9/24) Gelman est efffectivement une erreur statistician
  • (9/26) Blog Contents: August 2013
  • (9/29) Highly probable vs highly probed: Bayesian/ error statistical differences

October 2013

  • (10/3) Will the Real Junk Science Please Stand Up? (critical thinking)
  • (10/5) Was Janina Hosiasson pulling Harold Jeffreys’ leg?
  • (10/9) Bad statistics: crime or free speech (II)? Harkonen update: Phil Stat / Law /Stock
  • (10/12) Sir David Cox: a comment on the post, “Was Hosiasson pulling Jeffreys’ leg?”
  • (10/19) Blog Contents: September 2013
  • (10/19) Bayesian Confirmation Philosophy and the Tacking Paradox (iv)*
  • (10/25) Bayesian confirmation theory: example from last post…
  • (10/26) Comedy hour at the Bayesian (epistemology) retreat: highly probable vs highly probed (vs what ?)

November 2013

  • (11/2) Oxford Gaol: Statistical Bogeymen
  • (11/4) Forthcoming paper on the strong likelihood principle
  • (11/9) Null Effects and Replication
  • (11/9) Beware of questionable front page articles warning you to beware of questionable front page articles (iii)
  • (11/13) T. Kepler: “Trouble with ‘Trouble at the Lab’?” (guest post)
  • (11/16) PhilStock: No-pain bull
  • (11/16) S. Stanley Young: More Trouble with ‘Trouble in the Lab’ (Guest post)
  • (11/18) Lucien Le Cam: “The Bayesians hold the Magic”
  • (11/20) Erich Lehmann: Statistician and Poet
  • (11/23) Probability that it is a statistical fluke [i]
  • (11/27) “The probability that it be a statistical fluke” [iia]
  • (11/30) Saturday night comedy at the “Bayesian Boy” diary (rejected post*)

December 2013

  • (12/3) Stephen Senn: Dawid’s Selection Paradox (guest post)
  • (12/7) FDA’s New Pharmacovigilance
  • (12/9) Why ecologists might want to read more philosophy of science (UPDATED)
  • (12/11) Blog Contents for Oct and Nov 2013
  • (12/14) The error statistician has a complex, messy, subtle, ingenious piece-meal approach
  • (12/15) Surprising Facts about Surprising Facts
  • (12/19) A. Spanos lecture on “Frequentist Hypothesis Testing”
  • (12/24) U-Phil: Deconstructions [of J. Berger]: Irony & Bad Faith 3
  • (12/25) “Bad Arguments” (a book by Ali Almossawi)
  • (12/26) Mascots of Bayesneon statistics (rejected post)
  • (12/27) Deconstructing Larry Wasserman
  • (12/28) More on deconstructing Larry Wasserman (Aris Spanos)
  • (12/28) Wasserman on Wasserman: Update! December 28, 2013
  • (12/31) Midnight With Birnbaum (Happy New Year)

January 2014

  • (1/2) Winner of the December 2013 Palindrome Book Contest (Rejected Post)
  • (1/3) Error Statistics Philosophy: 2013
  • (1/4) Your 2014 wishing well. …
  • (1/7) “Philosophy of Statistical Inference and Modeling” New Course: Spring 2014: Mayo and Spanos: (Virginia Tech)
  • (1/11) Two Severities? (PhilSci and PhilStat)
  • (1/14) Statistical Science meets Philosophy of Science: blog beginnings
  • (1/16) Objective/subjective, dirty hands and all that: Gelman/Wasserman blogolog (ii)
  • (1/18) Sir Harold Jeffreys’ (tail area) one-liner: Sat night comedy [draft ii]
  • (1/22) Phil6334: “Philosophy of Statistical Inference and Modeling” New Course: Spring 2014: Mayo and Spanos (Virginia Tech) UPDATE: JAN 21
  • (1/24) Phil 6334: Slides from Day #1: Four Waves in Philosophy of Statistics
  • (1/25) U-Phil (Phil 6334) How should “prior information” enter in statistical inference?
  • (1/27) Winner of the January 2014 palindrome contest (rejected post)
  • (1/29) BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE: Revisiting the Foundations of Statistics
  • (1/31) Phil 6334: Day #2 Slides

February 2014

  • (2/1) Comedy hour at the Bayesian (epistemology) retreat: highly probable vs highly probed (vs B-boosts)
  • (2/3) PhilStock: Bad news is bad news on Wall St. (rejected post)
  • (2/5) “Probabilism as an Obstacle to Statistical Fraud-Busting” (draft iii)
  • (2/9) Phil6334: Day #3: Feb 6, 2014
  • (2/10) Is it true that all epistemic principles can only be defended circularly? A Popperian puzzle
  • (2/12) Phil6334: Popper self-test
  • (2/13) Phil 6334 Statistical Snow Sculpture
  • (2/14) January Blog Table of Contents
  • (2/15) Fisher and Neyman after anger management?
  • (2/17) R. A. Fisher: how an outsider revolutionized statistics
  • (2/18) Aris Spanos: The Enduring Legacy of R. A. Fisher
  • (2/20) R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’
  • (2/21) STEPHEN SENN: Fisher’s alternative to the alternative
  • (2/22) Sir Harold Jeffreys’ (tail-area) one-liner: Sat night comedy [draft ii]
  • (2/24) Phil6334: February 20, 2014 (Spanos): Day #5
  • (2/26) Winner of the February 2014 palindrome contest (rejected post)
  • (2/26) Phil6334: Feb 24, 2014: Induction, Popper and pseudoscience (Day #4)

March 2014

  • (3/1) Cosma Shalizi gets tenure (at last!) (metastat announcement)
  • (3/2) Significance tests and frequentist principles of evidence: Phil6334 Day #6
  • (3/3) Capitalizing on Chance (ii)
  • (3/4) Power, power everywhere–(it) may not be what you think! [illustration]
  • (3/8) Msc kvetch: You are fully dressed (even under you clothes)?
  • (3/8) Fallacy of Rejection and the Fallacy of Nouvelle Cuisine
  • (3/11) Phil6334 Day #7: Selection effects, the Higgs and 5 sigma, Power
  • (3/12) Get empowered to detect power howlers
  • (3/15) New SEV calculator (guest app: Durvasula)
  • (3/17) Stephen Senn: “Delta Force: To what extent is clinical relevance relevant?” (Guest Post)
  • (3/19) Power taboos: Statue of Liberty, Senn, Neyman, Carnap, Severity
  • (3/22) Fallacies of statistics & statistics journalism, and how to avoid them: Summary & Slides Day #8 (Phil 6334)
  • (3/25) The Unexpected Way Philosophy Majors Are Changing The World Of Business
  • (3/26) Phil6334:Misspecification Testing: Ordering From A Full Diagnostic Menu (part 1)
  • (3/28) Severe osteometric probing of skeletal remains: John Byrd
  • (3/29) Winner of the March 2014 palindrome contest (rejected post)
  • (3/30) Phil6334: March 26, philosophy of misspecification testing (Day #9 slides)

April 2014

  • (4/1) Skeptical and enthusiastic Bayesian priors for beliefs about insane asylum renovations at Dept of Homeland Security: I’m skeptical and unenthusiastic
  • (4/3) Self-referential blogpost (conditionally accepted*)
  • (4/5) Who is allowed to cheat? I.J. Good and that after dinner comedy hour. . ..
  • (4/6) Phil6334: Duhem’s Problem, highly probable vs highly probed; Day #9 Slides
  • (4/8) “Out Damned Pseudoscience: Non-significant results are the new ‘Significant’ results!” (update)
  • (4/12) “Murder or Coincidence?” Statistical Error in Court: Richard Gill (TEDx video)
  • (4/14) Phil6334: Notes on Bayesian Inference: Day #11 Slides
  • (4/16) A. Spanos: Jerzy Neyman and his Enduring Legacy
  • (4/17) Duality: Confidence intervals and the severity of tests
  • (4/19) Getting Credit (or blame) for Something You Didn’t Do (BP oil spill)
  • (4/21) Phil 6334: Foundations of statistics and its consequences: Day#12
  • (4/23) Phil 6334 Visitor: S. Stanley Young, “Statistics and Scientific Integrity”
  • (4/26) Reliability and Reproducibility: Fraudulent p-values through multiple testing (and other biases): S. Stanley Young (Phil 6334: Day #13)
  • (4/30) Able Stats Elba: 3 Palindrome nominees for April! (rejected post)

May 2014

  • (5/1) Putting the brakes on the breakthrough: An informal look at the argument for the Likelihood Principle
  • (5/3) You can only become coherent by ‘converting’ non-Bayesianly
  • (5/6) Winner of April Palindrome contest: Lori Wike
  • (5/7) A. Spanos: Talking back to the critics using error statistics (Phil6334)
  • (5/10) Who ya gonna call for statistical Fraudbusting? R.A. Fisher, P-values, and error statistics (again)
  • (5/15) Scientism and Statisticism: a conference* (i)
  • (5/17) Deconstructing Andrew Gelman: “A Bayesian wants everybody else to be a non-Bayesian.”
  • (5/20) The Science Wars & the Statistics Wars: More from the Scientism workshop
  • (5/25) Blog Table of Contents: March and April 2014
  • (5/27) Allan Birnbaum, Philosophical Error Statistician: 27 May 1923 – 1 July 1976
  • (5/31) What have we learned from the Anil Potti training and test data frameworks? Part 1 (draft 2)

June 2014

  • (6/5) Stephen Senn: Blood Simple? The complicated and controversial world of bioequivalence (guest post)
  • (6/9) “The medical press must become irrelevant to publication of clinical trials.”
  • (6/11) A. Spanos: “Recurring controversies about P values and confidence intervals revisited”
  • (6/14) “Statistical Science and Philosophy of Science: where should they meet?”
  • (6/21) Big Bayes Stories? (draft ii)
  • (6/25) Blog Contents: May 2014
  • (6/28) Sir David Hendry Gets Lifetime Achievement Award
  • (6/30) Some ironies in the ‘replication crisis’ in social psychology (4th and final installment)

July 2014

  • (7/7) Winner of June Palindrome Contest: Lori Wike
  • (7/8) Higgs Discovery 2 years on (1: “Is particle physics bad science?”)
  • (7/10) Higgs Discovery 2 years on (2: Higgs analysis and statistical flukes)
  • (7/14) “P-values overstate the evidence against the null”: legit or fallacious? (revised)
  • (7/23) Continued:”P-values overstate the evidence against the null”: legit or fallacious?
  • (7/26) S. Senn: “Responder despondency: myths of personalized medicine” (Guest Post)
  • (7/31) Roger Berger on Stephen Senn’s “Blood Simple” with a response by Senn (Guest Posts)

August 2014

  • (08/03) Blogging Boston JSM2014?
  • (08/05) Neyman, Power, and Severity
  • (08/06) What did Nate Silver just say? Blogging the JSM 2013
  • (08/09) Winner of July Palindrome: Manan Shah
  • (08/09) Blog Contents: June and July 2014
  • (08/11) Egon Pearson’s Heresy
  • (08/17) Are P Values Error Probabilities? Or, “It’s the methods, stupid!” (2nd install)
  • (08/23) Has Philosophical Superficiality Harmed Science?
  • (08/29) BREAKING THE LAW! (of likelihood): to keep their fit measures in line (A), (B 2nd)

[i]Table of Contents compiled by N. Jinn & J. Miller)*

*I thank Jean Miller for her assiduous work on the blog, and all contributors and readers for helping “frequentists in exile” to feel (and truly become) less exiled–wherever they may be!

Categories: blog contents, Metablog, Statistics | Leave a comment

3 in blog years: Sept 3 is 3rd anniversary of

Where did you hear this?  “Join me, if you will, for a little deep-water drilling, as I cast about on my isle of Elba.” Remember this and this? And this philosophical treatise on “moving blog day”? Oy, did I really write all this stuff?

cake baked by blog staff for 3 year anniversary of

I still see this as my rag-tag amateur blog. I never learned html and don’t have time to now. But the blog enterprise was more jocund and easy-going then–just an experiment, really, and a place to discuss our RMM papers. (And, of course, a home for error statistical philosophers-in-exile).

A blog table of contents for all three years will appear tomorrow.

Anyway, 2 representatives from Elba flew into NYC and  baked this cake in my never-used Chef’s oven (based on the cover/table of contents of EGEK 1996). We’ll be celebrating at A Different Place tonight[i]–so if you’re in the neighborhood, stop by after 8pm for an Elba Grease (on me).

Do you want a free signed copy of EGEK? Say why in 25 words or less (to, and the Fund for E.R.R.O.R.* will send them to the top 3 submissions (by 9/10/14).**

Acknowledgments: I want to thank the many commentators for their frequent insights and for keeping things interesting and lively. Among the regulars, and semi-regulars (but with impact) off the top of my head, and in no order: Senn, Yanofsky, Byrd, Gelman, Schachtman, Kepler, McKinney, S. Young, Matloff, O’Rourke, Gandenberger, Wasserman, E. Berk, Spanos, Glymour, Rohde, Greenland, Omaclaren,someone named Mark, assorted guests, original guests, and anons, and mysterious visitors, related twitterers (who would rather tweet from afar). I’m sure I’ve left some people out. Thanks to students and participants in the spring 2014 seminar with Aris Spanos (slides and lecture notes are still up).

I’m especially grateful to my regular guest bloggers: Stephen Senn and Aris Spanos, and to those who were subjected to deconstructions and to U-Phils in years past. (I may return to that some time.) Other guest posters for 2014 will be acknowledged in the year round up.

I thank blog compilers, Jean Miler and Nicole Jinn, and give special thanks for the tireless efforts of Jean Miller who has slogged through html, or whatever it is, when necessary, has scanned and put up dozens of articles to make them easy for readers to access, taken slow ferries back and forth to the island of Elba, and fixed gazillions of glitches on a daily basis. Last, but not least, to the palindromists who have been winning lots of books recently (1 day left for August submissions).

*Experimental Reasoning, Reliability, Objectivity and Rationality.

** Accompany submissions with an e-mail address and regular address. All submissions remain private. Elba judges decisions are final. Void in any places where prohibited by laws, be they laws of likelihood or Napoleanic laws-in-exile. But seriously, we’re giving away 3 books.

[i]email for directions.

Categories: Announcement, Statistics | 12 Comments

BREAKING THE LAW! (of likelihood): to keep their fit measures in line (A), (B 2nd)



1.An Assumed Law of Statistical Evidence (law of likelihood)

Nearly all critical discussions of frequentist error statistical inference (significance tests, confidence intervals, p- values, power, etc.) start with the following general assumption about the nature of inductive evidence or support:

Data x are better evidence for hypothesis H1 than for H0 if x are more probable under H1 than under H0.

Ian Hacking (1965) called this the logic of support: x supports hypotheses H1 more than H0 if H1 is more likely, given x than is H0:

Pr(x; H1) > Pr(x; H0).

[With likelihoods, the data x are fixed, the hypotheses vary.]*


x is evidence for H1 over H0 if the likelihood ratio LR (H1 over H0 ) is greater than 1.

It is given in other ways besides, but it’s the same general idea. (Some will take the LR as actually quantifying the support, others leave it qualitative.)

In terms of rejection:

“An hypothesis should be rejected if and only if there is some rival hypothesis much better supported [i.e., much more likely] than it is.” (Hacking 1965, 89)

2. Barnard (British Journal of Philosophy of Science )

But this “law” will immediately be seen to fail on our minimal severity requirement. Hunting for an impressive fit, or trying and trying again, it’s easy to find a rival hypothesis H1 much better “supported” than H0 even when H0 is true. Or, as Barnard (1972) puts it, “there always is such a rival hypothesis, viz. that things just had to turn out the way they actually did” (1972 p. 129).  H0: the coin is fair, gets a small likelihood (.5)k given k tosses of a coin, while H1: the probability of heads is 1 just on those tosses that yield a head, renders the sequence of k outcomes maximally likely. This is an example of Barnard’s “things just had to turn out as they did”. Or, to use an example with P-values: a statistically significant difference, being improbable under the null H0 , will afford high likelihood to any number of explanations that fit the data well.

3.Breaking the law (of likelihood) by going to the “second,” error statistical level:

How does it fail our severity requirement? First look at what the frequentist error statistician must always do to critique an inference: she must consider the capability of the inference method that purports to provide evidence for a claim. She goes to a higher level or metalevel, as it were. In this case, the likelihood ratio plays the role of the needed statistic d(X). To put it informally, she asks:

What’s the probability the method would yield an LR disfavoring H0 compared to some alternative H1  even if H0 is true?

Continue reading

Categories: highly probable vs highly probed, law of likelihood, Likelihood Principle, Statistics | 72 Comments

Has Philosophical Superficiality Harmed Science?



I have been asked what I thought of some criticisms of the scientific relevance of philosophy of science, as discussed in the following snippet from a recent Scientific American blog. My title elicits the appropriate degree of ambiguity, I think. 

Quantum Gravity Expert Says “Philosophical Superficiality” Has Harmed Physics

By John Horgan | August 21, 2014 |  14

“I interviewed Rovelli by phone in the early 1990s when I was writing a story for Scientific American about loop quantum gravity, a quantum-mechanical version of gravity proposed by Rovelli, Lee Smolin and Abhay Ashtekar[i]

Horgan: What’s your opinion of the recent philosophy-bashing by Stephen Hawking, Lawrence Krauss and Neil deGrasse Tyson?

Rovelli: Seriously: I think they are stupid in this.   I have admiration for them in other things, but here they have gone really wrong.  Look: Einstein, Heisenberg, Newton, Bohr…. and many many others of the greatest scientists of all times, much greater than the names you mention, of course, read philosophy, learned from philosophy, and could have never done the great science they did without the input they got from philosophy, as they claimed repeatedly. You see: the scientists that talk philosophy down are simply superficial: they have a philosophy (usually some ill-digested mixture of Popper and Kuhn) and think that this is the “true” philosophy, and do not realize that this has limitations.

Here is an example: theoretical physics has not done great in the last decades. Why? Well, one of the reasons, I think, is that it got trapped in a wrong philosophy: the idea that you can make progress by guessing new theory and disregarding the qualitative content of previous theories.  This is the physics of the “why not?”  Why not studying this theory, or the other? Why not another dimension, another field, another universe?  Science has never advanced in this manner in the past.  Science does not advance by guessing. It advances by new data or by a deep investigation of the content and the apparent contradictions of previous empirically successful theories.  Quite remarkably, the best piece of physics done by the three people you mention is Hawking’s black-hole radiation, which is exactly this.  But most of current theoretical physics is not of this sort.  Why?  Largely because of the philosophical superficiality of the current bunch of scientists.”

I find it intriguing that Rovelli suggests that “Science does not advance by guessing. It advances by new data or by a deep investigation of the content and the apparent contradictions of previous empirically successful theories.” I think this is an interesting and subtle claim with which I agree. Continue reading

Categories: StatSci meets PhilSci, strong likelihood principle | 33 Comments

Are P Values Error Probabilities? or, “It’s the methods, stupid!” (2nd install)



Despite the fact that Fisherians and Neyman-Pearsonians alike regard observed significance levels, or P values, as error probabilities, we occasionally hear allegations (typically from those who are neither Fisherian nor N-P theorists) that P values are actually not error probabilities. The denials tend to go hand in hand with allegations that P values exaggerate evidence against a null hypothesis—a problem whose cure invariably invokes measures that are at odds with both Fisherian and N-P tests. The Berger and Sellke (1987) article from a recent post is a good example of this. When leading figures put forward a statement that looks to be straightforwardly statistical, others tend to simply repeat it without inquiring whether the allegation actually mixes in issues of interpretation and statistical philosophy. So I wanted to go back and look at their arguments. I will post this in installments.

1. Some assertions from Fisher, N-P, and Bayesian camps

Here are some assertions from Fisherian, Neyman-Pearsonian and Bayesian camps: (I make no attempt at uniformity in writing the “P-value”, but retain the quotes as written.)

a) From the Fisherian camp (Cox and Hinkley):

For given observations y we calculate t = tobs = t(y), say, and the level of significance pobs by

pobs = Pr(T > tobs; H0).

….Hence pobs is the probability that we would mistakenly declare there to be evidence against H0, were we to regard the data under analysis as being just decisive against H0.” (Cox and Hinkley 1974, 66).

Thus pobs would be the Type I error probability associated with the test.

b) From the Neyman-Pearson N-P camp (Lehmann and Romano):

“[I]t is good practice to determine not only whether the hypothesis is accepted or rejected at the given significance level, but also to determine the smallest significance level…at which the hypothesis would be rejected for the given observation. This number, the so-called p-value gives an idea of how strongly the data contradict the hypothesis. It also enables others to reach a verdict based on the significance level of their choice.” (Lehmann and Romano 2005, 63-4) 

Very similar quotations are easily found, and are regarded as uncontroversial—even by Bayesians whose contributions stood at the foot of Berger and Sellke’s argument that P values exaggerate the evidence against the null. Continue reading

Categories: frequentist/Bayesian, J. Berger, P-values, Statistics | 32 Comments

Egon Pearson’s Heresy

E.S. Pearson: 11 Aug 1895-12 June 1980.

Today is Egon Pearson’s birthday: 11 August 1895-12 June, 1980.
E. Pearson rejected some of the familiar tenets that have come to be associated with Neyman and Pearson (N-P) statistical tests, notably the idea that the essential justification for tests resides in a long-run control of rates of erroneous interpretations–what he termed the “behavioral” rationale of tests. In an unpublished letter E. Pearson wrote to Birnbaum (1974), he talks about N-P theory admitting of two interpretations: behavioral and evidential:

“I think you will pick up here and there in my own papers signs of evidentiality, and you can say now that we or I should have stated clearly the difference between the behavioral and evidential interpretations. Certainly we have suffered since in the way the people have concentrated (to an absurd extent often) on behavioral interpretations”.

(Nowadays, some people concentrate to an absurd extent on “science-wise error rates in dichotomous screening”.)

When Erich Lehmann, in his review of my “Error and the Growth of Experimental Knowledge” (EGEK 1996), called Pearson “the hero of Mayo’s story,” it was because I found in E.S.P.’s work, if only in brief discussions, hints, and examples, the key elements for an “inferential” or “evidential” interpretation of N-P statistics. Granted, these “evidential” attitudes and practices have never been explicitly codified to guide the interpretation of N-P tests. If they had been, I would not be on about providing an inferential philosophy all these years.[i] Nevertheless, “Pearson and Pearson” statistics (both Egon, not Karl) would have looked very different from Neyman and Pearson statistics, I suspect. One of the few sources of E.S. Pearson’s statistical philosophy is his (1955) “Statistical Concepts in Their Relation to Reality”. It begins like this: Continue reading

Categories: phil/history of stat, Philosophy of Statistics, Statistics | Tags: , | 2 Comments

Blog Contents: June and July 2014

Image of business woman rolling a giant stone


Blog Contents: June and July 2014*

(6/5) Stephen Senn: Blood Simple? The complicated and controversial world of bioequivalence (guest post)

(6/9) “The medical press must become irrelevant to publication of clinical trials.”

(6/11) A. Spanos: “Recurring controversies about P values and confidence intervals revisited”

(6/14) “Statistical Science and Philosophy of Science: where should they meet?”

(6/21) Big Bayes Stories? (draft ii)

(6/25) Blog Contents: May 2014

(6/28) Sir David Hendry Gets Lifetime Achievement Award

(6/30) Some ironies in the ‘replication crisis’ in social psychology (4th and final installment) Continue reading

Categories: blog contents | Leave a comment

Winner of July Palindrome: Manan Shah


Manan Shah

Winner of July 2014 Contest:

Manan Shah


Trap May Elba, Dr. of Fanatic. I fed naan, deli-oiled naan, deficit an affordable yam part.

The requirements: 

In addition to using Elba, a candidate for a winning palindrome must have used fanatic. An optional second word was: part. An acceptable palindrome with both words would best an acceptable palindrome with just fanatic


Manan Shah is a mathematician and owner of Think. Plan. Do. LLC. ( He also maintains the “Math Misery?” blog at He holds a PhD in Mathematics from Florida State University.

Continue reading

Categories: Palindrome, Rejected Posts | Leave a comment

What did Nate Silver just say? Blogging the JSM 2013

imagesMemory Lane: August 6, 2013. My initial post on JSM13 (8/5/13) was here.

Nate Silver gave his ASA Presidential talk to a packed audience (with questions tweeted[i]). Here are some quick thoughts—based on scribbled notes (from last night). Silver gave a list of 10 points that went something like this (turns out there were 11):

1. statistics are not just numbers

2. context is needed to interpret data

3. correlation is not causation

4. averages are the most useful tool

5. human intuitions about numbers tend to be flawed and biased

6. people misunderstand probability

7. we should be explicit about our biases and (in this sense) should be Bayesian?

8. complexity is not the same as not understanding

9. being in the in crowd gets in the way of objectivity

10. making predictions improves accountability Continue reading

Categories: Statistics, StatSci meets PhilSci | 3 Comments

Neyman, Power, and Severity

April 16, 1894 – August 5, 1981

NEYMAN: April 16, 1894 – August 5, 1981

Jerzy Neyman: April 16, 1894-August 5, 1981. This reblogs posts under “The Will to Understand Power” & “Neyman’s Nursery” here & here.

Way back when, although I’d never met him, I sent my doctoral dissertation, Philosophy of Statistics, to one person only: Professor Ronald Giere. (And he would read it, too!) I knew from his publications that he was a leading defender of frequentist statistical methods in philosophy of science, and that he’d worked for at time with Birnbaum in NYC.

Some ten 15 years ago, Giere decided to quit philosophy of statistics (while remaining in philosophy of science): I think it had to do with a certain form of statistical exile (in philosophy). He asked me if I wanted his papers—a mass of work on statistics and statistical foundations gathered over many years. Could I make a home for them? I said yes. Then came his caveat: there would be a lot of them.

As it happened, we were building a new house at the time, Thebes, and I designed a special room on the top floor that could house a dozen or so file cabinets. (I painted it pale rose, with white lacquered book shelves up to the ceiling.) Then, for more than 9 months (same as my son!), I waited . . . Several boxes finally arrived, containing hundreds of files—each meticulously labeled with titles and dates.  More than that, the labels were hand-typed!  I thought, If Ron knew what a slob I was, he likely would not have entrusted me with these treasures. (Perhaps he knew of no one else who would  actually want them!) Continue reading

Categories: Neyman, phil/history of stat, power, Statistics | Tags: , , , | 4 Comments

Blogging Boston JSM2014?



I’m not there. (Several people have asked, I guess because I blogged JSM13.) If you hear of talks (or anecdotes) of interest to error, please comment here (or twitter: @learnfromerror)

Categories: Announcement | 7 Comments

Roger Berger on Stephen Senn’s “Blood Simple” with a response by Senn (Guest posts)

Roger BergerRoger L. Berger

School Director & Professor
School of Mathematical & Natural Science
Arizona State University

Comment on S. Senn’s post: Blood Simple? The complicated and controversial world of bioequivalence”(*)

First, I do agree with Senn’s statement that “the FDA requires conventional placebo-controlled trials of a new treatment to be tested at the 5% level two-sided but since they would never accept a treatment that was worse than placebo the regulator’s risk is 2.5% not 5%.” The FDA procedure essentially defines a one-sided test with Type I error probability (size) of .025. Why it is not just called this, I do not know. And if the regulators believe .025 is the appropriate Type I error probability, then perhaps it should be used in other situations, e.g., bioequivalence testing, as well.

Senn refers to a paper by Hsu and me (Berger and Hsu (1996)), and then attempts to characterize what we said. Unfortunately, I believe he has mischaracterized. Continue reading

Categories: bioequivalence, frequentist/Bayesian, PhilPharma, Statistics | Tags: , | 22 Comments

S. Senn: “Responder despondency: myths of personalized medicine” (Guest Post)

Stephen Senn


Stephen Senn
Head, Methodology and Statistics Group
Competence Center for Methodology and Statistics (CCMS)

Responder despondency: myths of personalized medicine

The road to drug development destruction is paved with good intentions. The 2013 FDA report, Paving the Way for Personalized Medicine  has an encouraging and enthusiastic foreword from Commissioner Hamburg and plenty of extremely interesting examples stretching back decades. Given what the report shows can be achieved on occasion, given the enthusiasm of the FDA and its commissioner, given the amazing progress in genetics emerging from the labs, a golden future of personalized medicine surely awaits us. It would be churlish to spoil the party by sounding a note of caution but I have never shirked being churlish and that is exactly what I am going to do. Continue reading

Categories: evidence-based policy, Statistics, Stephen Senn | 49 Comments

Blog at The Adventure Journal Theme.


Get every new post delivered to your Inbox.

Join 481 other followers