BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE: Understanding Reproducibility & Error Correction in Science


57th Annual Program

Download the 57th Annual Program

The Alfred I. Taub forum:


Cosponsored by GMS and BU’s BEST at Boston University.
Friday, March 17, 2017
1:00 p.m. – 5:00 p.m.
The Terrace Lounge, George Sherman Union
775 Commonwealth Avenue

  • Reputation, Variation, &, Control: Historical Perspectives
    Jutta Schickore History and Philosophy of Science & Medicine, Indiana University, Bloomington.
  • Crisis in Science: Time for Reform?
    Arturo Casadevall Molecular Microbiology & Immunology, Johns Hopkins
  • Severe Testing: The Key to Error Correction
    Deborah Mayo Philosophy, Virginia Tech
  • Replicate That…. Maintaining a Healthy Failure Rate in Science
    Stuart Firestein Biological Sciences, Columbia



Categories: Announcement, philosophy of science, Philosophy of Statistics, Statistical fraudbusting, Statistics | Leave a comment

Midnight With Birnbaum (Happy New Year 2016)

 Just as in the past 5 years since I’ve been blogging, I revisit that spot in the road at 11p.m., just outside the Elbar Room, get into a strange-looking taxi, and head to “Midnight With Birnbaum”. (The pic on the left is the only blurry image I have of the club I’m taken to.) I wonder if the car will come for me this year, given that my Birnbaum article has been out since 2014… The (Strong) Likelihood Principle–whether or not it is named–remains at the heart of many of the criticisms of Neyman-Pearson (N-P) statistics (and cognate methods). Yet as Birnbaum insisted, the “confidence concept” is the “one rock in a shifting scene” of statistical foundations, insofar as there’s interest in controlling the frequency of erroneous interpretations of data. (See my rejoinder.) Birnbaum bemoaned the lack of an explicit evidential interpretation of N-P methods. Maybe in 2017? Anyway, it’s 6 hrs later here, so I’m about to leave for that spot in the road… If I’m picked up, I’ll add an update at the end.

You know how in that (not-so) recent Woody Allen movie, “Midnight in Paris,” the main character (I forget who plays it, I saw it on a plane) is a writer finishing a novel, and he steps into a cab that mysteriously picks him up at midnight and transports him back in time where he gets to run his work by such famous authors as Hemingway and Virginia Wolf?  He is impressed when his work earns their approval and he comes back each night in the same mysterious cab…Well, imagine an error statistical philosopher is picked up in a mysterious taxi at midnight (New Year’s Eve 2011 2012, 2013, 2014, 2015, 2016) and is taken back fifty years and, lo and behold, finds herself in the company of Allan Birnbaum.[i] There are a couple of brief (12/31/14 & 15) updates at the end.  



ERROR STATISTICIAN: It’s wonderful to meet you Professor Birnbaum; I’ve always been extremely impressed with the important impact your work has had on philosophical foundations of statistics.  I happen to be writing on your famous argument about the likelihood principle (LP).  (whispers: I can’t believe this!)

BIRNBAUM: Ultimately you know I rejected the LP as failing to control the error probabilities needed for my Confidence concept. Continue reading

Categories: Birnbaum Brakes, Statistics, strong likelihood principle | Tags: , , , | 21 Comments

Szucs & Ioannidis Revive the Limb-Sawing Fallacy




When logical fallacies of statistics go uncorrected, they are repeated again and again…and again. And so it is with the limb-sawing fallacy I first posted in one of my “Overheard at the Comedy Hour” posts.* It now resides as a comic criticism of significance tests in a paper by Szucs and Ioannidis (posted this week),  Here’s their version:

“[P]aradoxically, when we achieve our goal and successfully reject Hwe will actually be left in complete existential vacuum because during the rejection of HNHST ‘saws off its own limb’ (Jaynes, 2003; p. 524): If we manage to reject H0then it follows that pr(data or more extreme data|H0) is useless because H0 is not true” (p.15).

Here’s Jaynes (p. 524):

“Suppose we decide that the effect exists; that is, we reject [null hypothesis] H0. Surely, we must also reject probabilities conditional on H0, but then what was the logical justification for the decision? Orthodox logic saws off its own limb.’ 

Ha! Ha! By this reasoning, no hypothetical testing or falsification could ever occur. As soon as H is falsified, the grounds for falsifying disappear! If H: all swans are white, then if I see a black swan, H is falsified. But according to this criticism, we can no longer assume the deduced prediction from H! What? Continue reading

Categories: Error Statistics, P-values, reforming the reformers, Statistics | 14 Comments


3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: December 2013. I mark in red three posts from each month that seem most apt for general background on key issues in this blog, excluding those reblogged recently[1], and in green up to 3 others I’d recommend[2].  Posts that are part of a “unit” or a group count as one. In this post, that makes 12/27-12/28 count as one.

December 2013

  • (12/3) Stephen Senn: Dawid’s Selection Paradox (guest post)
  • (12/7) FDA’s New Pharmacovigilance
  • (12/9) Why ecologists might want to read more philosophy of science (UPDATED)
  • (12/11) Blog Contents for Oct and Nov 2013
  • (12/14) The error statistician has a complex, messy, subtle, ingenious piece-meal approach
  • (12/15) Surprising Facts about Surprising Facts
  • (12/19) A. Spanos lecture on “Frequentist Hypothesis Testing
  • (12/24) U-Phil: Deconstructions [of J. Berger]: Irony & Bad Faith 3
  • (12/25) “Bad Arguments” (a book by Ali Almossawi)
  • (12/26) Mascots of Bayesneon statistics (rejected post)
  • (12/27) Deconstructing Larry Wasserman
  • (12/28) More on deconstructing Larry Wasserman (Aris Spanos)
  • (12/28) Wasserman on Wasserman: Update! December 28, 2013
  • (12/31) Midnight With Birnbaum (Happy New Year)

[1] Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.

[2] New Rule, July 30, 2016-very convenient.







Categories: 3-year memory lane, Bayesian/frequentist, Error Statistics, Statistics | 1 Comment

S. Senn: “Placebos: it’s not only the patients that are fooled” (Guest Post)

Stephen Senn

Stephen Senn

Placebos: it’s not only the patients that are fooled

Stephen Senn
Head of  Competence Center for Methodology and Statistics (CCMS)
Luxembourg Institute of Health

In my opinion a great deal of ink is wasted to little purpose in discussing placebos in clinical trials. Many commentators simply do not understand the nature and purpose of placebos. To start with the latter, their only purpose is to permit blinding of treatments and, to continue to the former, this implies that their nature is that they are specific to the treatment studied.

Consider an example. Suppose that Pannostrum Pharmaceuticals wishes to prove that its new treatment for migraine, Paineaze® (which is in the form of a small red circular pill) is superior to the market-leader offered by Allexir Laboratories, Kalmer® (which is a large purple lozenge). Pannostrum decides to do a head-to head comparison and of course, therefore will require placebos. Every patient will have to take a red pill and a purple lozenge. In the Paineaze arm what is red will be Paineaze and what is purple ‘placebo to Kalmer’. In the Kalmer arm what is red will be ‘placebo to Paineaze’ and what is purple will be Kalmer.


Continue reading

Categories: PhilPharma, PhilStat/Med, Statistics, Stephen Senn | 6 Comments

“Tests of Statistical Significance Made Sound”: excerpts from B. Haig



I came across a paper, “Tests of Statistical Significance Made Sound,” by Brian Haig, a psychology professor at the University of Canterbury, New Zealand. It hits most of the high notes regarding statistical significance tests, their history & philosophy and, refreshingly, is in the error statistical spirit! I’m pasting excerpts from his discussion of “The Error-Statistical Perspective”starting on p.7.[1]

The Error-Statistical Perspective

An important part of scientific research involves processes of detecting, correcting, and controlling for error, and mathematical statistics is one branch of methodology that helps scientists do this. In recognition of this fact, the philosopher of statistics and science, Deborah Mayo (e.g., Mayo, 1996), in collaboration with the econometrician, Aris Spanos (e.g., Mayo & Spanos, 2010, 2011), has systematically developed, and argued in favor of, an error-statistical philosophy for understanding experimental reasoning in science. Importantly, this philosophy permits, indeed encourages, the local use of ToSS, among other methods, to manage error. Continue reading

Categories: Bayesian/frequentist, Error Statistics, fallacy of rejection, P-values, Statistics | 12 Comments

Gelman at the PSA: “Confirmationist and Falsificationist Paradigms in Statistical Practice”: Comments & Queries

screen-shot-2016-10-26-at-10-23-07-pmTo resume sharing some notes I scribbled down on the contributions to our Philosophy of Science Association symposium on Philosophy of Statistics (Nov. 4, 2016), I’m up to Gelman. Comments on Gigerenzer and Glymour are here and here. Gelman didn’t use slides but gave a very thoughtful, extemporaneous presentation on his conception of “falsificationist Bayesianism”, its relation to current foundational issues, as well as to error statistical testing. My comments follow his abstract.

Confirmationist and Falsificationist Paradigms in Statistical Practice



Andrew Gelman

There is a divide in statistics between classical frequentist and Bayesian methods. Classical hypothesis testing is generally taken to follow a falsificationist, Popperian philosophy in which research hypotheses are put to the test and rejected when data do not accord with predictions. Bayesian inference is generally taken to follow a confirmationist philosophy in which data are used to update the probabilities of different hypotheses. We disagree with this conventional Bayesian-frequentist contrast: We argue that classical null hypothesis significance testing is actually used in a confirmationist sense and in fact does not do what it purports to do; and we argue that Bayesian inference cannot in general supply reasonable probabilities of models being true. The standard research paradigm in social psychology (and elsewhere) seems to be that the researcher has a favorite hypothesis A. But, rather than trying to set up hypothesis A for falsification, the researcher picks a null hypothesis B to falsify, which is then taken as evidence in favor of A. Research projects are framed as quests for confirmation of a theory, and once confirmation is achieved, there is a tendency to declare victory and not think too hard about issues of reliability and validity of measurements. Continue reading

Categories: Bayesian/frequentist, Gelman, Shalizi, Statistics | 148 Comments


3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: November 2013. I mark in red three posts from each month that seem most apt for general background on key issues in this blog, excluding those reblogged recently[1], and in green up to 3 others I’d recommend[2].  Posts that are part of a “unit” or a group count as one. Here I’m counting 11/9, 11/13, and 11/16 as one

November 2013

  • (11/2) Oxford Gaol: Statistical Bogeymen
  • (11/4) Forthcoming paper on the strong likelihood principle
  • (11/9) Null Effects and Replication (cartoon pic)
  • (11/9) Beware of questionable front page articles warning you to beware of questionable front page articles (iii)
  • (11/13) T. Kepler: “Trouble with ‘Trouble at the Lab’?” (guest post)
  • (11/16) PhilStock: No-pain bull
  • (11/16) S. Stanley Young: More Trouble with ‘Trouble in the Lab’ (Guest post)
  • (11/18) Lucien Le Cam: “The Bayesians hold the Magic”
  • (11/20) Erich Lehmann: Statistician and Poet
  • (11/23) Probability that it is a statistical fluke [i]
  • (11/27)The probability that it be a statistical fluke” [iia]
  • (11/30) Saturday night comedy at the “Bayesian Boy” diary (rejected post*)

[1] Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.

[2] New Rule, July 30, 2016-very convenient.







Categories: 3-year memory lane, Error Statistics, Statistics | Leave a comment

Gigerenzer at the PSA: “How Fisher, Neyman-Pearson, & Bayes Were Transformed into the Null Ritual”: Comments and Queries (ii)



Gerd Gigerenzer, Andrew Gelman, Clark Glymour and I took part in a very interesting symposium on Philosophy of Statistics at the Philosophy of Science Association last Friday. I jotted down lots of notes, but I’ll limit myself to brief reflections and queries on a small portion of each presentation in turn, starting with Gigerenzer’s “Surrogate Science: How Fisher, Neyman-Pearson, & Bayes Were Transformed into the Null Ritual.” His complete slides are below my comments. I may write this in stages, this being (i).



  1. Good scientific practice–bold theories, double-blind experiments, minimizing measurement error, replication, etc.–became reduced in the social science to a surrogate: statistical significance.

I agree that “good scientific practice” isn’t some great big mystery, and that “bold theories, double-blind experiments, minimizing measurement error, replication, etc.” are central and interconnected keys to finding things out in error prone inquiry. Do the social sciences really teach that inquiry can be reduced to cookbook statistics? Or is it simply that, in some fields, carrying out surrogate science suffices to be a “success”? Continue reading

Categories: Fisher, frequentist/Bayesian, Gigerenzer, Gigerenzer, P-values, spurious p values, Statistics | 11 Comments


3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: October 2013. I mark in red three posts from each month that seem most apt for general background on key issues in this blog, excluding those reblogged recently[1], and in green up to 3 others I’d recommend[2].  Posts that are part of a “unit” or a pair count as one.

October 2013

  • (10/3) Will the Real Junk Science Please Stand Up? (critical thinking)
  • (10/5) Was Janina Hosiasson pulling Harold Jeffreys’ leg?
  • (10/9) Bad statistics: crime or free speech (II)? Harkonen update: Phil Stat / Law /Stock
  • (10/12) Sir David Cox: a comment on the post, “Was Hosiasson pulling Jeffreys’ leg?”(10/5 and 10/12 are a pair)
  • (10/19) Blog Contents: September 2013
  • (10/19) Bayesian Confirmation Philosophy and the Tacking Paradox (iv)*
  • (10/25) Bayesian confirmation theory: example from last post…(10/19 and 10/25 are a pair)
  • (10/26) Comedy hour at the Bayesian (epistemology) retreat: highly probable vs highly probed (vs what ?)
  • (10/31) WHIPPING BOYS AND WITCH HUNTERS (interesting to see how things have changed and stayed the same over the past few years, share comments)

[1] Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.

[2] New Rule, July 30, 2016-very convenient.







Categories: 3-year memory lane, Error Statistics, Statistics | 22 Comments

For Statistical Transparency: Reveal Multiplicity and/or Just Falsify the Test (Remark on Gelman and Colleagues)



Gelman and Loken (2014) recognize that even without explicit cherry picking there is often enough leeway in the “forking paths” between data and inference so that by artful choices you may be led to one inference, even though it also could have gone another way. In good sciences, measurement procedures should interlink with well-corroborated theories and offer a triangulation of checks– often missing in the types of experiments Gelman and Loken are on about. Stating a hypothesis in advance, far from protecting from the verification biases, can be the engine that enables data to be “constructed”to reach the desired end [1].

[E]ven in settings where a single analysis has been carried out on the given data, the issue of multiple comparisons emerges because different choices about combining variables, inclusion and exclusion of cases…..and many other steps in the analysis could well have occurred with different data (Gelman and Loken 2014, p. 464).

An idea growing out of this recognition is to imagine the results of applying the same statistical procedure, but with different choices at key discretionary junctures–giving rise to a multiverse analysis, rather than a single data set (Steegen, Tuerlinckx, Gelman, and Vanpaemel 2016). One lists the different choices thought to be plausible at each stage of data processing. The multiverse displays “which constellation of choices corresponds to which statistical results” (p. 797). The result of this exercise can, at times, mimic the delineation of possibilities in multiple testing and multiple modeling strategies. Continue reading

Categories: Bayesian/frequentist, Error Statistics, Gelman, P-values, preregistration, reproducibility, Statistics | 9 Comments

A new front in the statistics wars? Peaceful negotiation in the face of so-called ‘methodological terrorism’

images-30I haven’t been blogging that much lately, as I’m tethered to the task of finishing revisions on a book (on the philosophy of statistical inference!) But I noticed two interesting blogposts, one by Jeff Leek, another by Andrew Gelman, and even a related petition on Twitter, reflecting a newish front in the statistics wars: When it comes to improving scientific integrity, do we need more carrots or more sticks? 

Leek’s post, from yesterday, called “Statistical Vitriol” (29 Sep 2016), calls for de-escalation of the consequences of statistical mistakes:

Over the last few months there has been a lot of vitriol around statistical ideas. First there were data parasites and then there were methodological terrorists. These epithets came from established scientists who have relatively little statistical training. There was the predictable backlash to these folks from their counterparties, typically statisticians or statistically trained folks who care about open source.
Continue reading

Categories: Anil Potti, fraud, Gelman, pseudoscience, Statistics | 15 Comments

G.A. Barnard’s 101st Birthday: The Bayesian “catch-all” factor: probability vs likelihood


G. A. Barnard: 23 Sept 1915-30 July, 2002

Today is George Barnard’s 101st birthday. In honor of this, I reblog an exchange between Barnard, Savage (and others) on likelihood vs probability. The exchange is from pp 79-84 (of what I call) “The Savage Forum” (Savage, 1962).[i] Six other posts on Barnard are linked below: 2 are guest posts (Senn, Spanos); the other 4 include a play (pertaining to our first meeting), and a letter he wrote to me. 


BARNARD:…Professor Savage, as I understand him, said earlier that a difference between likelihoods and probabilities was that probabilities would normalize because they integrate to one, whereas likelihoods will not. Now probabilities integrate to one only if all possibilities are taken into account. This requires in its application to the probability of hypotheses that we should be in a position to enumerate all possible hypotheses which might explain a given set of data. Now I think it is just not true that we ever can enumerate all possible hypotheses. … If this is so we ought to allow that in addition to the hypotheses that we really consider we should allow something that we had not thought of yet, and of course as soon as we do this we lose the normalizing factor of the probability, and from that point of view probability has no advantage over likelihood. This is my general point, that I think while I agree with a lot of the technical points, I would prefer that this is talked about in terms of likelihood rather than probability. I should like to ask what Professor Savage thinks about that, whether he thinks that the necessity to enumerate hypotheses exhaustively, is important. Continue reading

Categories: Barnard, highly probable vs highly probed, phil/history of stat, Statistics | 14 Comments

Peircean Induction and the Error-Correcting Thesis (Part I)

C. S. Peirce: 10 Sept, 1839-19 April, 1914

C. S. Peirce: 10 Sept, 1839-19 April, 1914

Today is C.S. Peirce’s birthday. He’s one of my all time heroes. You should read him: he’s a treasure chest on essentially any topic, and he anticipated several major ideas in statistics (e.g., randomization, confidence intervals) as well as in logic. I’ll reblog the first portion of a (2005) paper of mine. Links to Parts 2 and 3 are at the end. It’s written for a very general philosophical audience; the statistical parts are pretty informal. Happy birthday Peirce.

Peircean Induction and the Error-Correcting Thesis
Deborah G. Mayo
Transactions of the Charles S. Peirce Society: A Quarterly Journal in American Philosophy, Volume 41, Number 2, 2005, pp. 299-319

Peirce’s philosophy of inductive inference in science is based on the idea that what permits us to make progress in science, what allows our knowledge to grow, is the fact that science uses methods that are self-correcting or error-correcting:

Induction is the experimental testing of a theory. The justification of it is that, although the conclusion at any stage of the investigation may be more or less erroneous, yet the further application of the same method must correct the error. (5.145)

Continue reading

Categories: Bayesian/frequentist, C.S. Peirce, Error Statistics, Statistics | 18 Comments

All She Wrote (so far): Error Statistics Philosophy: 5 years on

metablog old fashion typewriter

D.G. Mayo with her  blogging typewriter

Error Statistics Philosophy: Blog Contents (5 years) [i]
By: D. G. Mayo

Dear Reader: It’s hard to believe I’ve been blogging for five years (since Sept. 3, 2011)! A big celebration is taking place at the Elbar Room this evening. If you’re in the neighborhood, stop by for some Elba Grease.

Amazingly, this old typewriter not only still works; one of the whiz kids on Elba managed to bluetooth it to go directly from my typewriter onto the blog (I never got used to computer keyboards.) I still must travel to London to get replacement ribbons for this klunker.

Please peruse the offerings below, and take advantage of some of the super contributions and discussions by guest posters and readers! I don’t know how much longer I’ll continue blogging, but at least until the publication of my book on statistical inference. After that I plan to run conferences, workshops, and ashrams on PhilStat and PhilSci, and will invite readers to take part! Keep reading and commenting. Sincerely, D. Mayo



September 2011

Continue reading

Categories: blog contents, Metablog, Statistics | 11 Comments

TragiComedy hour: P-values vs posterior probabilities vs diagnostic error rates

Did you hear the one about the frequentist significance tester when he was shown the nonfrequentist nature of p-values?

Critic: I just simulated a long series of tests on a pool of null hypotheses, and I found that among tests with p-values of .05, at least 22%—and typically over 50%—of the null hypotheses are true!

Frequentist Significance Tester: Scratches head: But rejecting the null with a p-value of .05 ensures erroneous rejection no more than 5% of the time!

Raucous laughter ensues!

(Hah, hah… “So funny, I forgot to laugh! Or, I’m crying and laughing at the same time!) Continue reading

Categories: Bayesian/frequentist, Comedy, significance tests, Statistics | 9 Comments

History of statistics sleuths out there? “Ideas came into my head as I sat on a gate overlooking an experimental blackcurrant plot”–No wait, it was apples, probably

E.S.Pearson on Gate

E.S.Pearson on a Gate, Mayo sketch

Here you see my scruffy sketch of Egon drawn 20 years ago for the frontispiece of my book, “Error and the Growth of Experimental Knowledge” (EGEK 1996). The caption is

“I might recall how certain early ideas came into my head as I sat on a gate overlooking an experimental blackcurrant plot… –E.S Pearson, “Statistical Concepts in Their Relation to Reality”.

He is responding to Fisher to “dispel the picture of the Russian technological bogey”. [i]

So, as I said in my last post, just to make a short story long, I’ve recently been scouring around the history and statistical philosophies of Neyman, Pearson and Fisher for purposes of a book soon to be completed, and I discovered a funny little error about this quote. Only maybe 3 or 4 people alive would care, but maybe someone out there knows the real truth.

OK, so I’d been rereading Constance Reid’s great biography of Neyman, and in one place she interviews Egon about the sources of inspiration for their work. Here’s what Egon tells her: Continue reading

Categories: E.S. Pearson, phil/history of stat, Statistics | 1 Comment

Performance or Probativeness? E.S. Pearson’s Statistical Philosophy

egon pearson

E.S. Pearson (11 Aug, 1895-12 June, 1980)

This is a belated birthday post for E.S. Pearson (11 August 1895-12 June, 1980). It’s basically a post from 2012 which concerns an issue of interpretation (long-run performance vs probativeness) that’s badly confused these days. I’ve recently been scouring around the history and statistical philosophies of Neyman, Pearson and Fisher for purposes of a book soon to be completed. I recently discovered a little anecdote that calls for a correction in something I’ve been saying for years. While it’s little more than a point of trivia, it’s in relation to Pearson’s (1955) response to Fisher (1955)–the last entry in this post.  I’ll wait until tomorrow or the next day to share it, to give you a chance to read the background. 


Are methods based on error probabilities of use mainly to supply procedures which will not err too frequently in some long run? (performance). Or is it the other way round: that the control of long run error properties are of crucial importance for probing the causes of the data at hand? (probativeness). I say no to the former and yes to the latter. This, I think, was also the view of Egon Sharpe (E.S.) Pearson. 

Cases of Type A and Type B

“How far then, can one go in giving precision to a philosophy of statistical inference?” (Pearson 1947, 172)

Continue reading

Categories: 4 years ago!, highly probable vs highly probed, phil/history of stat, Statistics | Tags: | Leave a comment

If you think it’s a scandal to be without statistical falsification, you will need statistical tests (ii)

Screen Shot 2016-08-09 at 2.55.33 PM


1. PhilSci and StatSci. I’m always glad to come across statistical practitioners who wax philosophical, particularly when Karl Popper is cited. Best of all is when they get the philosophy somewhere close to correct. So, I came across an article by Burnham and Anderson (2014) in Ecology:

While the exact definition of the so-called ‘scientific method’ might be controversial, nearly everyone agrees that the concept of ‘falsifiability’ is a central tenant [sic] of empirical science (Popper 1959). It is critical to understand that historical statistical approaches (i.e., P values) leave no way to ‘test’ the alternative hypothesis. The alternative hypothesis is never tested, hence cannot be rejected or falsified!… Surely this fact alone makes the use of significance tests and P values bogus. Lacking a valid methodology to reject/falsify the alternative science hypotheses seems almost a scandal.” (Burnham and Anderson p. 629)

Well I am (almost) scandalized by this easily falsifiable allegation! I can’t think of a single “alternative”, whether in a “pure” Fisherian or a Neyman-Pearson hypothesis test (whether explicit or implicit) that’s not falsifiable; nor do the authors provide any. I grant that understanding testability and falsifiability is far more complex than the kind of popularized accounts we hear about; granted as well, theirs is just a short paper.[1] But then why make bold declarations on the topic of the “scientific method and statistical science,” on falsifiability and testability? Continue reading

Categories: P-values, Severity, statistical tests, Statistics, StatSci meets PhilSci | 22 Comments

S. Senn: “Painful dichotomies” (Guest Post)


Stephen Senn
Head of  Competence Center for Methodology and Statistics (CCMS)
Luxembourg Institute of Health
Twitter @stephensenn

Painful dichotomies

The tweet read “Featured review: Only 10% people with tension-type headaches get a benefit from paracetamol” and immediately I thought, ‘how would they know?’ and almost as quickly decided, ‘of course they don’t know, they just think they know’. Sure enough, on following up the link to the Cochrane Review in the tweet it turned out that, yet again, the deadly mix of dichotomies and numbers needed to treat had infected the brains of researchers to the extent that they imagined that they had identified personal response. (See Responder Despondency for a previous post on this subject.)

The bare facts they established are the following:

The International Headache Society recommends the outcome of being pain free two hours after taking a medicine. The outcome of being pain free or having only mild pain at two hours was reported by 59 in 100 people taking paracetamol 1000 mg, and in 49 out of 100 people taking placebo.

and the false conclusion they immediately asserted is the following

This means that only 10 in 100 or 10% of people benefited because of paracetamol 1000 mg.

To understand the fallacy, look at the accompanying graph. Continue reading

Categories: junk science, PhilStat/Med, Statistics, Stephen Senn | 27 Comments

Blog at