G.A. Barnard’s 101st Birthday: The Bayesian “catch-all” factor: probability vs likelihood


G. A. Barnard: 23 Sept 1915-30 July, 2002

Today is George Barnard’s 101st birthday. In honor of this, I reblog an exchange between Barnard, Savage (and others) on likelihood vs probability. The exchange is from pp 79-84 (of what I call) “The Savage Forum” (Savage, 1962).[i] Six other posts on Barnard are linked below: 2 are guest posts (Senn, Spanos); the other 4 include a play (pertaining to our first meeting), and a letter he wrote to me. 


BARNARD:…Professor Savage, as I understand him, said earlier that a difference between likelihoods and probabilities was that probabilities would normalize because they integrate to one, whereas likelihoods will not. Now probabilities integrate to one only if all possibilities are taken into account. This requires in its application to the probability of hypotheses that we should be in a position to enumerate all possible hypotheses which might explain a given set of data. Now I think it is just not true that we ever can enumerate all possible hypotheses. … If this is so we ought to allow that in addition to the hypotheses that we really consider we should allow something that we had not thought of yet, and of course as soon as we do this we lose the normalizing factor of the probability, and from that point of view probability has no advantage over likelihood. This is my general point, that I think while I agree with a lot of the technical points, I would prefer that this is talked about in terms of likelihood rather than probability. I should like to ask what Professor Savage thinks about that, whether he thinks that the necessity to enumerate hypotheses exhaustively, is important. Continue reading

Categories: Barnard, highly probable vs highly probed, phil/history of stat, Statistics | 14 Comments

The Myth of ‘The Myth of Objectivity” (i)

images-28Objectivity in statistics, as in science more generally, is a matter of both aims and methods. Objective science, in our view, aims to find out what is the case as regards aspects of the world [that hold] independently of our beliefs, biases and interests; thus objective methods aim for the critical control of inference and hypotheses, constraining them by evidence and checks of error. (Cox and Mayo 2010, p. 276)

I. The myth of objectivity.
Whenever you come up against blanket slogans such as “no methods are objective” or “all methods are equally objective and subjective,” it is a good guess that the problem is being trivialized into oblivion. Yes, there are judgments, disagreements, and values in any human activity, which alone makes it too trivial an observation to distinguish among very different ways that threats of bias and unwarranted inferences may be controlled. Is the objectivity-subjectivity distinction really toothless as many will have you believe? I say no.

Cavalier attitudes toward objectivity are in tension with widely endorsed movements to promote replication, reproducibility, and to come clean on a number of sources behind illicit results: multiple testing, cherry picking, failed assumptions, researcher latitude, publication bias and so on. The moves to take back science–if they are not mere lip-service–are rooted in the supposition that we can more objectively scrutinize results,even if it’s only to point out those that are poorly tested. The fact that the term “objectivity” is used equivocally should not be taken as grounds to oust it, but rather to engage in the difficult work of identifying what there is in “objectivity” that we won’t give up, and shouldn’t. Continue reading

Categories: Background knowledge | Tags: | 6 Comments

Peircean Induction and the Error-Correcting Thesis (Part I)

C. S. Peirce: 10 Sept, 1839-19 April, 1914

C. S. Peirce: 10 Sept, 1839-19 April, 1914

Today is C.S. Peirce’s birthday. He’s one of my all time heroes. You should read him: he’s a treasure chest on essentially any topic, and he anticipated several major ideas in statistics (e.g., randomization, confidence intervals) as well as in logic. I’ll reblog the first portion of a (2005) paper of mine. Links to Parts 2 and 3 are at the end. It’s written for a very general philosophical audience; the statistical parts are pretty informal. Happy birthday Peirce.

Peircean Induction and the Error-Correcting Thesis
Deborah G. Mayo
Transactions of the Charles S. Peirce Society: A Quarterly Journal in American Philosophy, Volume 41, Number 2, 2005, pp. 299-319

Peirce’s philosophy of inductive inference in science is based on the idea that what permits us to make progress in science, what allows our knowledge to grow, is the fact that science uses methods that are self-correcting or error-correcting:

Induction is the experimental testing of a theory. The justification of it is that, although the conclusion at any stage of the investigation may be more or less erroneous, yet the further application of the same method must correct the error. (5.145)

Continue reading

Categories: Bayesian/frequentist, C.S. Peirce, Error Statistics, Statistics | 18 Comments

All She Wrote (so far): Error Statistics Philosophy: 5 years on

metablog old fashion typewriter

D.G. Mayo with her  blogging typewriter

Error Statistics Philosophy: Blog Contents (5 years) [i]
By: D. G. Mayo

Dear Reader: It’s hard to believe I’ve been blogging for five years (since Sept. 3, 2011)! A big celebration is taking place at the Elbar Room this evening. If you’re in the neighborhood, stop by for some Elba Grease.

Amazingly, this old typewriter not only still works; one of the whiz kids on Elba managed to bluetooth it to go directly from my typewriter onto the blog (I never got used to computer keyboards.) I still must travel to London to get replacement ribbons for this klunker.

Please peruse the offerings below, and take advantage of some of the super contributions and discussions by guest posters and readers! I don’t know how much longer I’ll continue blogging, but at least until the publication of my book on statistical inference. After that I plan to run conferences, workshops, and ashrams on PhilStat and PhilSci, and will invite readers to take part! Keep reading and commenting. Sincerely, D. Mayo



September 2011

Continue reading

Categories: blog contents, Metablog, Statistics | 11 Comments

TragiComedy hour: P-values vs posterior probabilities vs diagnostic error rates

Did you hear the one about the frequentist significance tester when he was shown the nonfrequentist nature of p-values?

Critic: I just simulated a long series of tests on a pool of null hypotheses, and I found that among tests with p-values of .05, at least 22%—and typically over 50%—of the null hypotheses are true!

Frequentist Significance Tester: Scratches head: But rejecting the null with a p-value of .05 ensures erroneous rejection no more than 5% of the time!

Raucous laughter ensues!

(Hah, hah… “So funny, I forgot to laugh! Or, I’m crying and laughing at the same time!) Continue reading

Categories: Bayesian/frequentist, Comedy, significance tests, Statistics | 9 Comments

Larry Laudan: “‘Not Guilty’: The Misleading Verdict” (Guest Post)

Larry Laudan


Prof. Larry Laudan
Lecturer in Law and Philosophy
University of Texas at Austin

“‘Not Guilty’: The Misleading Verdict and How It Fails to Serve either Society or the Innocent Defendant”

Most legal systems in the developed world share in common a two-tier verdict system: ‘guilty’ and ‘not guilty’.  Typically, the standard for a judgment of guilty is set very high while the standard for a not-guilty verdict (if we can call it that) is quite low. That means any level of apparent guilt less than about 90% confidence that the defendant committed the crime leads to an acquittal (90% being the usual gloss on proof beyond a reasonable doubt, although few legal systems venture a definition of BARD that precise). According to conventional wisdom, the major reason for setting the standard as high as we do is the desire, even the moral necessity, to shield the innocent from false conviction. Continue reading

Categories: L. Laudan, PhilStat Law | Tags: | 22 Comments

History of statistics sleuths out there? “Ideas came into my head as I sat on a gate overlooking an experimental blackcurrant plot”–No wait, it was apples, probably

E.S.Pearson on Gate

E.S.Pearson on a Gate, Mayo sketch

Here you see my scruffy sketch of Egon drawn 20 years ago for the frontispiece of my book, “Error and the Growth of Experimental Knowledge” (EGEK 1996). The caption is

“I might recall how certain early ideas came into my head as I sat on a gate overlooking an experimental blackcurrant plot… –E.S Pearson, “Statistical Concepts in Their Relation to Reality”.

He is responding to Fisher to “dispel the picture of the Russian technological bogey”. [i]

So, as I said in my last post, just to make a short story long, I’ve recently been scouring around the history and statistical philosophies of Neyman, Pearson and Fisher for purposes of a book soon to be completed, and I discovered a funny little error about this quote. Only maybe 3 or 4 people alive would care, but maybe someone out there knows the real truth.

OK, so I’d been rereading Constance Reid’s great biography of Neyman, and in one place she interviews Egon about the sources of inspiration for their work. Here’s what Egon tells her: Continue reading

Categories: E.S. Pearson, phil/history of stat, Statistics | 1 Comment

Performance or Probativeness? E.S. Pearson’s Statistical Philosophy

egon pearson

E.S. Pearson (11 Aug, 1895-12 June, 1980)

This is a belated birthday post for E.S. Pearson (11 August 1895-12 June, 1980). It’s basically a post from 2012 which concerns an issue of interpretation (long-run performance vs probativeness) that’s badly confused these days. I’ve recently been scouring around the history and statistical philosophies of Neyman, Pearson and Fisher for purposes of a book soon to be completed. I recently discovered a little anecdote that calls for a correction in something I’ve been saying for years. While it’s little more than a point of trivia, it’s in relation to Pearson’s (1955) response to Fisher (1955)–the last entry in this post.  I’ll wait until tomorrow or the next day to share it, to give you a chance to read the background. 


Are methods based on error probabilities of use mainly to supply procedures which will not err too frequently in some long run? (performance). Or is it the other way round: that the control of long run error properties are of crucial importance for probing the causes of the data at hand? (probativeness). I say no to the former and yes to the latter. This, I think, was also the view of Egon Sharpe (E.S.) Pearson. 

Cases of Type A and Type B

“How far then, can one go in giving precision to a philosophy of statistical inference?” (Pearson 1947, 172)

Continue reading

Categories: 4 years ago!, highly probable vs highly probed, phil/history of stat, Statistics | Tags: | Leave a comment

If you think it’s a scandal to be without statistical falsification, you will need statistical tests (ii)

Screen Shot 2016-08-09 at 2.55.33 PM


1. PhilSci and StatSci. I’m always glad to come across statistical practitioners who wax philosophical, particularly when Karl Popper is cited. Best of all is when they get the philosophy somewhere close to correct. So, I came across an article by Burnham and Anderson (2014) in Ecology:

While the exact definition of the so-called ‘scientific method’ might be controversial, nearly everyone agrees that the concept of ‘falsifiability’ is a central tenant [sic] of empirical science (Popper 1959). It is critical to understand that historical statistical approaches (i.e., P values) leave no way to ‘test’ the alternative hypothesis. The alternative hypothesis is never tested, hence cannot be rejected or falsified!… Surely this fact alone makes the use of significance tests and P values bogus. Lacking a valid methodology to reject/falsify the alternative science hypotheses seems almost a scandal.” (Burnham and Anderson p. 629)

Well I am (almost) scandalized by this easily falsifiable allegation! I can’t think of a single “alternative”, whether in a “pure” Fisherian or a Neyman-Pearson hypothesis test (whether explicit or implicit) that’s not falsifiable; nor do the authors provide any. I grant that understanding testability and falsifiability is far more complex than the kind of popularized accounts we hear about; granted as well, theirs is just a short paper.[1] But then why make bold declarations on the topic of the “scientific method and statistical science,” on falsifiability and testability? Continue reading

Categories: P-values, Severity, statistical tests, Statistics, StatSci meets PhilSci | 22 Comments

S. Senn: “Painful dichotomies” (Guest Post)


Stephen Senn
Head of  Competence Center for Methodology and Statistics (CCMS)
Luxembourg Institute of Health
Twitter @stephensenn

Painful dichotomies

The tweet read “Featured review: Only 10% people with tension-type headaches get a benefit from paracetamol” and immediately I thought, ‘how would they know?’ and almost as quickly decided, ‘of course they don’t know, they just think they know’. Sure enough, on following up the link to the Cochrane Review in the tweet it turned out that, yet again, the deadly mix of dichotomies and numbers needed to treat had infected the brains of researchers to the extent that they imagined that they had identified personal response. (See Responder Despondency for a previous post on this subject.)

The bare facts they established are the following:

The International Headache Society recommends the outcome of being pain free two hours after taking a medicine. The outcome of being pain free or having only mild pain at two hours was reported by 59 in 100 people taking paracetamol 1000 mg, and in 49 out of 100 people taking placebo.

and the false conclusion they immediately asserted is the following

This means that only 10 in 100 or 10% of people benefited because of paracetamol 1000 mg.

To understand the fallacy, look at the accompanying graph. Continue reading

Categories: junk science, PhilStat/Med, Statistics, Stephen Senn | 27 Comments


3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: July 2013. I mark in red three posts that seem most apt for general background on key issues in this blog, excluding those reblogged recently [1], and in green up to 3 others I’d recommend[2].  Posts that are part of a “unit” or a group of “U-Phils”(you [readers] philosophize) count as one.

July 2013

  • (7/3) Phil/Stat/Law: 50 Shades of gray between error and fraud
  • (7/6) Bad news bears: ‘Bayesian bear’ rejoinder–reblog mashup
  • (7/10) PhilStatLaw: Reference Manual on Scientific Evidence (3d ed) on Statistical Significance (Schachtman)
  • (7/11) Is Particle Physics Bad Science? (memory lane)
  • (7/13) Professor of Philosophy Resigns over Sexual Misconduct (rejected post)
  • (7/14) Stephen Senn: Indefinite irrelevance
  • (7/17) Phil/Stat/Law: What Bayesian prior should a jury have? (Schachtman)
  • (7/19) Msc Kvetch: A question on the Martin-Zimmerman case we do not hear
  • (7/20) Guest Post: Larry Laudan. Why Presuming Innocence is Not a Bayesian Prior
  • (7/23) Background Knowledge: Not to Quantify, But To Avoid Being Misled By, Subjective Beliefs
  • (7/26) New Version: On the Birnbaum argument for the SLP: Slides for JSM talk

[1] Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.

[2] New Rule, July 30, 2016.







Categories: 3-year memory lane, Error Statistics, Statistics | Leave a comment

For Popper’s Birthday: A little Popper self-test & reading from Conjectures and Refutations


28 July 1902 – 17 September 1994

Today is Karl Popper’s birthday. I’m linking to a reading from his Conjectures and Refutations[i] along with an undergraduate item I came across: Popper Self-Test Questions. It includes multiple choice questions, quotes to ponder, and thumbnail definitions at the end[ii].

Blog Readers who wish to send me their answers will have their papers graded (at least try the multiple choice; if you’re unsure, do the reading). [Use the comments or e-mail.]

[i] Popper reading from Conjectures and Refutations
[ii] I might note the “No-Pain philosophy” (3 part) Popper posts from this blog: parts 12, and 3.



Popper, K. (1962). Conjectures and Refutations: The Growth of Scientific Knowledge. New York: Basic Books.



Categories: Popper | 11 Comments

“Nonsignificance Plus High Power Does Not Imply Support for the Null Over the Alternative.”


Seeing the world through overly rosy glasses

Taboos about power nearly always stem from misuse of power analysis. Sander Greenland (2012) has a paper called “Nonsignificance Plus High Power Does Not Imply Support for the Null Over the Alternative.”  I’m not saying Greenland errs; the error would be made by anyone who interprets power analysis in a manner giving rise to Greenland’s objection. So what’s (ordinary) power analysis?

(I) Listen to Jacob Cohen (1988) introduce Power Analysis

“PROVING THE NULL HYPOTHESIS. Research reports in the literature are frequently flawed by conclusions that state or imply that the null hypothesis is true. For example, following the finding that the difference between two sample means is not statistically significant, instead of properly concluding from this failure to reject the null hypothesis that the data do not warrant the conclusion that the population means differ, the writer concludes, at least implicitly, that there is no difference. The latter conclusion is always strictly invalid, and is functionally invalid as well unless power is high. The high frequency of occurrence of this invalid interpretation can be laid squarely at the doorstep of the general neglect of attention to statistical power in the training of behavioral scientists. Continue reading

Categories: Cohen, Greenland, power, Statistics | 46 Comments

Philosophy and History of Science Announcements



2016 UK-EU Foundations of Physics Conference

Start Date:16 July 2016

Categories: Announcement | Leave a comment


3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: June 2013. I mark in red three posts that seem most apt for general background on key issues in this blog, excluding those reblogged recently [1].  Posts that are part of a “unit” or a group of “U-Phils”(you [readers] philosophize) count as one. Here I grouped 6/5 and 6/6.

June 2013

  • (6/1) Winner of May Palindrome Contest
  • (6/1) Some statistical dirty laundry*(recently reblogged)
  • (6/5) Do CIs Avoid Fallacies of Tests? Reforming the Reformers :(6/5 and6/6 are paired as one)
  • (6/6) PhilStock: Topsy-Turvy Game
  • (6/6) Anything Tests Can do, CIs do Better; CIs Do Anything Better than Tests?* (reforming the reformers cont.)
  • (6/8) Richard Gill: “Integrity or fraud… or just questionable research practices?*(recently reblogged)
  • (6/11) Mayo: comment on the repressed memory research [How a conceptual criticism, requiring no statistics, might go.]
  • (6/14) P-values can’t be trusted except when used to argue that p-values can’t be trusted!
  • (6/19) PhilStock: The Great Taper Caper
  • (6/19) Stanley Young: better p-values through randomization in microarrays
  • (6/22) What do these share in common: m&ms, limbo stick, ovulation, Dale Carnegie? Sat night potpourri*(recently reblogged)
  • (6/26) Why I am not a “dualist” in the sense of Sander Greenland
  • (6/29) Palindrome “contest” contest
  • (6/30) Blog Contents: mid-year

[1] Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.






Categories: 3-year memory lane, Error Statistics, Statistics | Leave a comment

A. Birnbaum: Statistical Methods in Scientific Inference (May 27, 1923 – July 1, 1976)

Allan Birnbaum: May 27, 1923- July 1, 1976

Allan Birnbaum died 40 years ago today. He lived to be only 53 [i]. From the perspective of philosophy of statistics and philosophy of science, Birnbaum is best known for his work on likelihood, the Likelihood Principle [ii], and for his attempts to blend concepts of likelihood with error probability ideas to arrive at what he termed “concepts of statistical evidence”. Failing to find adequate concepts of statistical evidence, Birnbaum called for joining the work of “interested statisticians, scientific workers and philosophers and historians of science”–an idea I have heartily endorsed. While known for a result that the (strong) Likelihood Principle followed from sufficiency and conditionality principles (a result that Jimmy Savage deemed one of the greatest breakthroughs in statistics), a few years after publishing it, he turned away from it, perhaps discovering gaps in his argument. A post linking to a 2014 Statistical Science issue discussing Birnbaum’s result is here. Reference [5] links to the Synthese 1977 volume dedicated to his memory. The editors describe it as their way of “paying homage to Professor Birnbaum’s penetrating and stimulating work on the foundations of statistics”. Ample weekend reading! Continue reading

Categories: Birnbaum, Likelihood Principle, phil/history of stat, Statistics | Tags: | 62 Comments

Richard Gill: “Integrity or fraud… or just questionable research practices?” (Is Gill too easy on them?)

Professor Gill

Professor Gill

Professor Richard Gill
Statistics Group
Mathematical Institute
Leiden University

It was statistician Richard Gill who first told me about Diederik Stapel (see an earlier post on Diederik). We were at a workshop on Error in the Sciences at Leiden in 2011. I was very lucky to have Gill be assigned as my commentator/presenter—he was excellent! As I was explaining some data problems to him, he suddenly said, “Some people don’t bother to collect data at all!” That’s when I learned about Stapel.

Committees often turn to Gill when someone’s work is up for scrutiny of bad statistics or fraud, or anything in between. Do you think he’s being too easy on researchers when he says, about a given case:

“data has been obtained by some combination of the usual ‘questionable research practices’ [QRPs] which are prevalent in the field in question. Everyone does it this way, in fact, if you don’t, you’d never get anything published. …People are not deliberately cheating: they honestly believe in their theories and believe the data is supporting them.”

Isn’t that the danger in relying on deeply felt background beliefs?  Have our attitudes changed (toward QRPs) over the past 3 years (harsher or less harsh)? Here’s a talk of his I blogged 3 years ago (followed by a letter he allowed me to post). I reflect on the pseudoscientific nature of the ‘recovered memories’ program in one of the Geraerts et al. papers in a later post. Continue reading

Categories: 3-year memory lane, junk science, Statistical fraudbusting, Statistics | 4 Comments

What do these share in common: m&ms, limbo stick, ovulation, Dale Carnegie? Are we lowering the bar?


For entertainment only

In a post 3 years ago (“What do these share in common: m&m’s, limbo stick, ovulation, Dale Carnegie? Sat night potpourri”), I expressed doubts about expending serious effort to debunk the statistical credentials of studies that most readers without any statistical training would regard as “for entertainment only,” dubious, or pseudoscientific quackery. It needn’t even be that the claim is implausible, what’s implausible is that it has been well probed in the experiment at hand. Given the attention being paid to such examples by some leading statisticians, and scores of replication researchers over the past 3 years–attention that has been mostly worthwhile–maybe the bar has been lowered. What do you think? Anyway, this is what I blogged 3 years ago. (Oh, I decided to put in a home-made cartoon!) Continue reading

Categories: junk science, replication research, Statistics | 2 Comments

Some statistical dirty laundry: have the stains become permanent?



Right after our session at the SPSP meeting last Friday, I chaired a symposium on replication that included Brian Earp–an active player in replication research in psychology (Replication and Evidence: A tenuous relationship p. 80). One of the first things he said, according to my notes, is that gambits such as cherry picking, p-hacking, hunting for significance, selective reporting, and other QRPs, had been taught as acceptable become standard practice in psychology, without any special need to adjust p-values or alert the reader to their spuriousness [i]. (He will correct me if I’m wrong[2].) It shocked me to hear it, even though it shouldn’t have, given what I’ve learned about statistical practice in social science. It was the Report on Stapel that really pulled back the curtain on this attitude toward QRPs in social psychology–as discussed in this blogpost 3 years ago. (If you haven’t read Section 5 of the report on flawed science, you should.) Many of us assumed that QRPs, even if still committed, were at least recognized to be bad statistical practices since the time of Morrison and Henkel’s (1970) Significance Test Controversy. A question now is this: have all the confessions of dirty laundry, the fraudbusting of prominent researchers, the pledges to straighten up and fly right, the years of replication research, done anything to remove the stains? I leave the question open for now. Here’s my “statistical dirty laundry” post from 2013: Continue reading

Categories: junk science, reproducibility, spurious p values, Statistics | 4 Comments

Mayo & Parker “Using PhilStat to Make Progress in the Replication Crisis in Psych” SPSP Slides

Screen Shot 2016-06-19 at 12.53.32 PMHere are the slides from our talk at the Society for Philosophy of Science in Practice (SPSP) conference. I covered the first 27, Parker the rest. The abstract is here:

Categories: P-values, reforming the reformers, replication research, Statistics, StatSci meets PhilSci | Leave a comment

Blog at WordPress.com.