“Using PhilStat to Make Progress in the Replication Crisis in Psych” at Society for PhilSci in Practice (SPSP)

Screen Shot 2016-06-15 at 1.19.23 PMI’m giving a joint presentation with Caitlin Parker[1] on Friday (June 17) at the meeting of the Society for Philosophy of Science in Practice (SPSP): “Using Philosophy of Statistics to Make Progress in the Replication Crisis in Psychology” (Rowan University, Glassboro, N.J.)[2] The Society grew out of a felt need to break out of the sterile straightjacket wherein philosophy of science occurs divorced from practice. The topic of the relevance of PhilSci and PhilStat to Sci has often come up on this blog, so people might be interested in the SPSP mission statement below our abstract.

Using Philosophy of Statistics to Make Progress in the Replication Crisis in Psychology

Deborah Mayo Virginia Tech, Department of Philosophy United States
Caitlin Parker Virginia Tech, Department of Philosophy United States

Continue reading

Categories: Announcement, replication research, reproducibility | 8 Comments

“So you banned p-values, how’s that working out for you?” D. Lakens exposes the consequences of a puzzling “ban” on statistical inference



I came across an excellent post on a blog kept by Daniel Lakens: “So you banned p-values, how’s that working out for you?” He refers to the journal that recently banned significance tests, confidence intervals, and a vague assortment of other statistical methods, on the grounds that all such statistical inference tools are “invalid” since they don’t provide posterior probabilities of some sort (see my post). The editors’ charge of “invalidity” could only hold water if these error statistical methods purport to provide posteriors based on priors, which is false. The entire methodology is based on methods in which probabilities arise to qualify the method’s capabilities to detect and avoid erroneous interpretations of data [0]. The logic is of the falsification variety found throughout science. Lakens, an experimental psychologist, does a great job delineating some of the untoward consequences of their inferential ban. I insert some remarks in black. Continue reading

Categories: frequentist/Bayesian, Honorary Mention, P-values, reforming the reformers, science communication, Statistics | 45 Comments

Winner of May 2016 Palindrome Contest: Curtis Williams



Winner of the May 2016 Palindrome contest

Curtis Williams: Inventor, entrepreneur, and professional actor

The winning palindrome (a dialog): 


“Disable preplan?… I, Mon Ami?”


“Calm…Sit, fella.”

“No! I tag. I vandalized Dezi, lad.”

“Navigational leftism lacks aim…a nominal perp: Elba’s id.”

The requirement: A palindrome using “navigate” or “navigation” (and Elba, of course).

Book choiceError and Inference (D. Mayo & A. Spanos, Cambridge University Press, 2010)

Curtis Cartoon Caption 1


Bio: Curtis Mark Williams is the co-founder of WavHello and the inventor of Bellybuds, who also counts himself as an occasional professional actor who has performed on Broadway [1] and in several television shows and films. 
He currently resides in Los Angeles with his lovely wife, two daughters, his dog, Newton, and his framed New Yorker Caption Contest winning cartoon. [He has been a finalist twice and the one he won is contest #329, by Joe Dator (inspired by his theatrical background. :)] Continue reading

Categories: Palindrome | Leave a comment

“A sense of security regarding the future of statistical science…” Anon review of Error and Inference



Aris Spanos, my colleague (in economics) and co-author, came across this anonymous review of our Error and Inference (2010) [E & I]. Interestingly, the reviewer remarks that “The book gives a sense of security regarding the future of statistical science and its importance in many walks of life.” We’re not sure what the reviewer means–but it’s appreciated regardless. This post was from yesterday’s 3-year memory lane and was first posted here.

2010 American Statistical Association and the American Society for Quality

TECHNOMETRICS, AUGUST 2010, VOL. 52, NO. 3, Book Reviews, 52:3, pp. 362-370.

Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, edited by Deborah G. MAYO and Aris SPANOS, New York: Cambridge University Press, 2010, ISBN 978-0-521-88008-4, xvii+419 pp., $60.00.

This edited volume contemplates the interests of both scientists and philosophers regarding gathering reliable information about the problem/question at hand in the presence of error, uncertainty, and with limited data information.

The volume makes a significant contribution in bridging the gap between scientific practice and the philosophy of science. The main contribution of this volume pertains to issues of error and inference, and showcases intriguing discussions on statistical testing and providing alternative strategy to Bayesian inference. In words, it provides cumulative information towards the philosophical and methodological issues of scientific inquiry at large.

The target audience of this volume is quite general and open to a broad readership. With some reasonable knowledge of probability theory and statistical science, one can get the maximum benefit from most of the chapters of the volume. The volume contains original and fascinating articles by eminent scholars (nine, including the editors) who range from names in statistical science to philosophy, including D. R. Cox, a name well known to statisticians. Continue reading

Categories: 3-year memory lane, Review of Error and Inference, Statistics | 3 Comments


3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: May 2013. I mark in red three posts that seem most apt for general background on key issues in this blog [1].  Some of the May 2013 posts blog the conference we held earlier that month: “Ontology and Methodology”.  I highlight in burgundy a post on Birnbaum that follows up on my last post in honor of his birthday. New questions or comments can be placed on this post.

May 2013

  • (5/3) Schedule for Ontology & Methodology, 2013
  • (5/6) Professorships in Scandal?
  • (5/9) If it’s called the “The High Quality Research Act,” then ….
  • (5/13) ‘No-Shame’ Psychics Keep Their Predictions Vague: New Rejected post
  • (5/14) “A sense of security regarding the future of statistical science…” Anon review of Error and Inference
  • (5/18) Gandenberger on Ontology and Methodology (May 4) Conference: Virginia Tech
  • (5/19) Mayo: Meanderings on the Onto-Methodology Conference
  • (5/22) Mayo’s slides from the Onto-Meth conference
  • (5/24) Gelman sides w/ Neyman over Fisher in relation to a famous blow-up
  • (5/26) Schachtman: High, Higher, Highest Quality Research Act
  • (5/27) A.Birnbaum: Statistical Methods in Scientific Inference
  • (5/29) K. Staley: review of Error & Inference

 [1]Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.

Categories: 3-year memory lane, Statistics | Leave a comment

Allan Birnbaum: Foundations of Probability and Statistics (27 May 1923 – 1 July 1976)

27 May 1923-1 July 1976

27 May 1923-1 July 1976

Today is Allan Birnbaum’s birthday. In honor of his birthday this year, I’m posting the articles in the Synthese volume that was dedicated to his memory in 1977. The editors describe it as their way of  “paying homage to Professor Birnbaum’s penetrating and stimulating work on the foundations of statistics”. I paste a few snippets from the articles by Giere and Birnbaum. If you’re interested in statistical foundations, and are unfamiliar with Birnbaum, here’s a chance to catch up.(Even if you are,you may be unaware of some of these key papers.)


Synthese Volume 36, No. 1 Sept 1977: Foundations of Probability and Statistics, Part I

Editorial Introduction:

This special issue of Synthese on the foundations of probability and statistics is dedicated to the memory of Professor Allan Birnbaum. Professor Birnbaum’s essay ‘The Neyman-Pearson Theory as Decision Theory; and as Inference Theory; with a Criticism of the Lindley-Savage Argument for Bayesian Theory’ was received by the editors of Synthese in October, 1975, and a decision was made to publish a special symposium consisting of this paper together with several invited comments and related papers. The sad news about Professor Birnbaum’s death reached us in the summer of 1976, but the editorial project could nevertheless be completed according to the original plan. By publishing this special issue we wish to pay homage to Professor Birnbaum’s penetrating and stimulating work on the foundations of statistics. We are grateful to Professor Ronald Giere who wrote an introductory essay on Professor Birnbaum’s concept of statistical evidence and who compiled a list of Professor Birnbaum’s publications.


Continue reading

Categories: Birnbaum, Error Statistics, Likelihood Principle, Statistics, strong likelihood principle | 7 Comments

Frequentstein: What’s wrong with (1 – β)/α as a measure of evidence against the null? (ii)



In their “Comment: A Simple Alternative to p-values,” (on the ASA P-value document), Benjamin and Berger (2016) recommend researchers report a pre-data Rejection Ratio:

It is the probability of rejection when the alternative hypothesis is true, divided by the probability of rejection when the null hypothesis is true, i.e., the ratio of the power of the experiment to the Type I error of the experiment. The rejection ratio has a straightforward interpretation as quantifying the strength of evidence about the alternative hypothesis relative to the null hypothesis conveyed by the experimental result being statistically significant. (Benjamin and Berger 2016, p. 1)

The recommendation is much more fully fleshed out in a 2016 paper by Bayarri, Benjamin, Berger, and Sellke (BBBS 2016): Rejection Odds and Rejection Ratios: A Proposal for Statistical Practice in Testing Hypotheses. Their recommendation is:

…that researchers should report the ‘pre-experimental rejection ratio’ when presenting their experimental design and researchers should report the ‘post-experimental rejection ratio’ (or Bayes factor) when presenting their experimental results. (BBBS 2016, p. 3)….

The (pre-experimental) ‘rejection ratio’ Rpre , the ratio of statistical power to significance threshold (i.e., the ratio of the probability of rejecting under H1 and H0 respectively), is shown to capture the strength of evidence in the experiment for Hover H0. (ibid., p. 2)

But in fact it does no such thing! [See my post from the FUSION conference here.] J. Berger, and his co-authors, will tell you the rejection ratio (and a variety of other measures created over the years) are entirely frequentist because they are created out of frequentist error statistical measures. But a creation built on frequentist measures doesn’t mean the resulting animal captures frequentist error statistical reasoning. It might be a kind of Frequentstein monster! [1] Continue reading

Categories: J. Berger, power, reforming the reformers, S. Senn, Statistical power, Statistics | 36 Comments

Fallacies of Rejection, Nouvelle Cuisine, and assorted New Monsters


Jackie Mason

Whenever I’m in London, my criminologist friend Katrin H. and I go in search of stand-up comedy. Since it’s Saturday night (and I’m in London), we’re setting out in search of a good comedy club (I’ll complete this post upon return). A few years ago we heard Jackie Mason do his shtick, a one-man show billed as his swan song to England.  It was like a repertoire of his “Greatest Hits” without a new or updated joke in the mix.  Still, hearing his rants for the nth time was often quite hilarious. It turns out that he has already been back doing another “final shtick tour” in England, but not tonight.

A sample: If you want to eat nothing, eat nouvelle cuisine. Do you know what it means? No food. The smaller the portion the more impressed people are, so long as the food’s got a fancy French name, haute cuisine. An empty plate with sauce!

As one critic wrote, Mason’s jokes “offer a window to a different era,” one whose caricatures and biases one can only hope we’ve moved beyond:

But it’s one thing for Jackie Mason to scowl at a seat in the front row and yell to the shocked audience member in his imagination, “These are jokes! They are just jokes!” and another to reprise statistical howlers, which are not jokes, to me. This blog found its reason for being partly as a place to expose, understand, and avoid them. I had earlier used this Jackie Mason opening to launch into a well-known fallacy of rejection using statistical significance tests. I’m going to go further this time around. I began by needling some leading philosophers of statistics: Continue reading

Categories: reforming the reformers, science-wise screening, Statistical power, statistical tests, Statistics | Tags: , , , , | 5 Comments

Excerpts from S. Senn’s Letter on “Replication, p-values and Evidence”

old blogspot typewriter


I first blogged this letter here. Below the references are some more recent blog links of relevance to this issue. 

 Dear Reader:  I am typing in some excerpts from a letter Stephen Senn shared with me in relation to my April 28, 2012 blogpost.  It is a letter to the editor of Statistics in Medicine  in response to S. Goodman. It contains several important points that get to the issues we’ve been discussing. You can read the full letter here. Sincerely, D. G. Mayo


From: Stephen Senn*

Some years ago, in the pages of this journal, Goodman gave an interesting analysis of ‘replication probabilities’ of p-values. Specifically, he considered the possibility that a given experiment had produced a p-value that indicated ‘significance’ or near significance (he considered the range p=0.10 to 0.001) and then calculated the probability that a study with equal power would produce a significant result at the conventional level of significance of 0.05. He showed, for example, that given an uninformative prior, and (subsequently) a resulting p-value that was exactly 0.05 from the first experiment, the probability of significance in the second experiment was 50 per cent. A more general form of this result is as follows. If the first trial yields p=α then the probability that a second trial will be significant at significance level α (and in the same direction as the first trial) is 0.5. Continue reading

Categories: 4 years ago!, reproducibility, S. Senn, Statistics | Tags: , , , | 3 Comments

My Slides: “The Statistical Replication Crisis: Paradoxes and Scapegoats”

Below are the slides from my Popper talk at the LSE today (up to slide 70): (post any questions in the comments)


Categories: P-values, replication research, reproducibility, Statistics | 11 Comments

Some bloglinks for my LSE talk tomorrow: “The Statistical Replication Crisis: Paradoxes and Scapegoats”

Popper talk May 10 locationIn my Popper talk tomorrow today (in London), I will discuss topics in philosophy of statistics in relation to:  the 2016 ASA document on P-values, and recent replication research in psychology. For readers interested in links from this blog, see:

I. My commentary on the ASA document on P-values (with links to the ASA document):

Don’t Throw Out the Error Control Baby with the Bad Statistics Bathwater”

A Small P-value Indicates that the Results are Due to Chance Alone: Fallacious or not: More on the ASA P-value Doc”

“P-Value Madness: A Puzzle About the Latest Test Ban, or ‘Don’t Ask, Don’t Tell’”

II. Posts on replication research in psychology: Continue reading

Categories: Metablog, P-values, replication research, reproducibility, Statistics | 7 Comments

My Popper Talk at LSE: The Statistical Replication Crisis: Paradoxes and Scapegoats

I’m giving a Popper talk at the London School of Economics next Tuesday (10 May). If you’re in the neighborhood, I hope you’ll stop by.

Popper talk May 10 location

A somewhat accurate blurb is here. I say “somewhat” because it doesn’t mention that I’ll talk a bit about the replication crisis in psychology, and the issues that crop up (or ought to) in connecting statistical results and the causal claim of interest.



Categories: Announcement | 6 Comments

Philosophy & Physical Computing Graduate Workshop at VT

A Graduate Summer Workshop at Virginia Tech (Poster)

Application deadline: May 8, 2016 


Think & Code VT

JULY 11-24, 2016 at Virginia Tech

Who should apply:

  • This workshop is open to graduate students in master’s or PhD programs in philosophy or the sciences, including computer science.

For additional information or to apply online, visit thinkandcode.vtlibraries.org, or contact Dr. Benjamin Jantzen at bjantzen@vt.edu

Categories: Announcement | Leave a comment

Yes, these were not (entirely) real–my five April pranks



My “April 1” posts for the past 5 years have been so close to the truth or possible truth that they weren’t always spotted as April Fool’s pranks, which is what made them genuine April Fool’s pranks. (After a few days I labeled them as such, or revealed it in a comment). So since it’s Saturday night on the last night of April, I’m reblogging my 5 posts from first days of April. (Which fooled you the most?) Continue reading

Categories: Comedy, Statistics | 1 Comment


3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago–March & April 2013. I missed March memory lane, so both are combined here. I mark in red three posts most apt for a general background on key issues in this blog [1]. I’ve added some remarks in blue this month, for some of the posts that are not marked in red.

March 2013

  • (3/1) capitalizing on chance-Worth a look (has a pic of Mayo gambling)!
  • (3/4) Big Data or Pig Data?–Funny & clever(guest post)!
  • (3/7) Stephen Senn: Casting Stones
  • (3/10) Blog Contents 2013 (Jan & Feb)
  • (3/11) S. Stanley Young: Scientific Integrity and Transparency
  • (3/13) Risk-Based Security: Knives and Axes-Funny, strange!
  • (3/15) Normal Deviate: Double Misunderstandings About p-values–worth keeping in mind.
  • (3/17) Update on Higgs data analysis: statistical flukes (1)
  • (3/21) Telling the public why the Higgs particle matters
  • (3/23) Is NASA suspending public education and outreach?
  • (3/27) Higgs analysis and statistical flukes (part 2)
  • (3/31) possible progress on the comedy hour circuit?–One of my favorites, a bit of progress

Continue reading

Categories: 3-year memory lane, Statistics | Leave a comment

Jerzy Neyman and “Les Miserables Citations” (statistical theater in honor of his birthday)


Neyman April 16, 1894 – August 5, 1981

In honor of Jerzy Neyman’s birthday today, a local acting group is putting on a short theater production based on a screenplay I wrote:  “Les Miserables Citations” (“Those Miserable Quotes”) [1]. The “miserable” citations are those everyone loves to cite, from their early joint 1933 paper:

We are inclined to think that as far as a particular hypothesis is concerned, no test based upon the theory of probability can by itself provide any valuable evidence of the truth or falsehood of that hypothesis.

But we may look at the purpose of tests from another viewpoint. Without hoping to know whether each separate hypothesis is true or false, we may search for rules to govern our behavior with regard to them, in following which we insure that, in the long run of experience, we shall not be too often wrong. (Neyman and Pearson 1933, pp. 290-1).

Continue reading

Categories: E.S. Pearson, Neyman, Statistics | 7 Comments

When the rejection ratio (1 – β)/α turns evidence on its head, for those practicing in an error-statistical tribe (ii)



I’m about to hear Jim Berger give a keynote talk this afternoon at a FUSION conference I’m attending. The conference goal is to link Bayesian, frequentist and fiducial approaches: BFF.  (Program is here. See the blurb below [0]).  April 12 update below*. Berger always has novel and intriguing approaches to testing, so I was especially curious about the new measure.  It’s based on a 2016 paper by Bayarri, Benjamin, Berger, and Sellke (BBBS 2016): Rejection Odds and Rejection Ratios: A Proposal for Statistical Practice in Testing Hypotheses. They recommend:

that researchers should report what we call the ‘pre-experimental rejection ratio’ when presenting their experimental design and researchers should report what we call the ‘post-experimental rejection ratio’ (or Bayes factor) when presenting their experimental results.” (BBBS 2016)….

“The (pre-experimental) ‘rejection ratio’ Rpre , the ratio of statistical power to significance threshold (i.e., the ratio of the probability of rejecting under H1 and H0 respectively), is shown to capture the strength of evidence in the experiment for H1 over H0 .”

If you’re seeking a comparative probabilist measure, the ratio of power/size can look like a likelihood ratio in favor of the alternative. To a practicing member of an error statistical tribe, however, whether along the lines of N, P, or F (Neyman, Pearson or Fisher), things can look topsy turvy. Continue reading

Categories: confidence intervals and tests, power, Statistics | 31 Comments

Winner of March 2016 Palindrome: Manan Shah


Manan Shah

Manan Shah channels Jack Nicholson in “The Shining” to win this month’s palindrome contest (and the book of his choice).* 

Winner of March 2016 Contest: Manan Shah



Palindrome: I was able to. I did add well. Liking is, I say, as evil as dad’s aloof. Delivery reviled sign: “I red rum”. Examine men I’m axe murdering. Is delivery reviled? Fool! As dad’s alive, say as I sign: “I kill lewd dad.” Idiot Elba saw I.

The requirements: In addition to using Elba, a candidate for a winning palindrome must have used examine (or examined or examination)

Bio: Manan Shah is a mathematician and owner of Think. Plan. Do. LLC  (www.ThinkPlanDoLLC.com). He writes at www.mathmisery.com and is looking to publish his first book, hopefully by the end of this year. He holds a PhD in Mathematics from Florida State University.
Continue reading

Categories: Palindrome, Rejected Posts | Leave a comment

I’m speaking at Univ of Minnesota on Friday

I’ll be speaking at U of Minnesota tomorrow. I’m glad to see a group with interest in philosophical foundations of statistics as well as the foundations of experiment and measurement in psychology. I will post my slides afterwards. Come by if you’re in the neighborhood. 

University of Minnesota
“The ASA (2016) Statement on P-values and
How to Stop Refighting the Statistics Wars”

April 8, 2016 at 3:35 p.m.




Deborah G. Mayo
Department of Philosophy, Virginia Tech

The CLA Quantitative Methods
Collaboration Committee
Minnesota Center for Philosophy of Science

275 Nicholson Hall
216 Pillsbury Drive SE
University of Minnesota
Minneapolis MN


This will be a mixture of my current take on the “statistics wars” together with my reflections on the recent ASA document on P-values. I was invited over a year ago already by Niels Waller, a co-author of Paul Meehl. I’ll never forget when I was there in 1997: Paul Meehl was in the audience, waving my book in the air–EGEK (1996)–and smiling!

Categories: Announcement | 3 Comments

Er, about those “other statistical approaches”: Hold off until a balanced critique is in?



I could have told them that the degree of accordance enabling the “6 principles” on p-values was unlikely to be replicated when it came to most of the “other approaches” with which some would supplement or replace significance tests– notably Bayesian updating, Bayes factors, or likelihood ratios (confidence intervals are dual to hypotheses tests). [My commentary is here.] So now they may be advising a “hold off” or “go slow” approach until some consilience is achieved. Is that it? I don’t know. I was tweeted an article about the background chatter taking place behind the scenes; I wasn’t one of people interviewed for this. Here are some excerpts, I may add more later after it has had time to sink in. (check back later)

“Reaching for Best Practices in Statistics: Proceed with Caution Until a Balanced Critique Is In”

J. Hossiason

“[A]ll of the other approaches*, as well as most statistical tools, may suffer from many of the same problems as the p-values do. What level of likelihood ratio in favor of the research hypothesis will be acceptable to the journal? Should scientific discoveries be based on whether posterior odds pass a specific threshold (P3)? Does either measure the size of an effect (P5)?…How can we decide about the sample size needed for a clinical trial—however analyzed—if we do not set a specific bright-line decision rule? 95% confidence intervals or credence intervals…offer no protection against selection when only those that do not cover 0, are selected into the abstract (P4). (Benjamini, ASA commentary, pp. 3-4)

What’s sauce for the goose is sauce for the gander right?  Many statisticians seconded George Cobb who urged “the board to set aside time at least once every year to consider the potential value of similar statements” to the recent ASA p-value report. Disappointingly, a preliminary survey of leaders in statistics, many from the original p-value group, aired striking disagreements on best and worst practices with respect to these other approaches. The Executive Board is contemplating a variety of recommendations, minimally, Continue reading

Categories: Bayesian/frequentist, Statistics | 84 Comments

Create a free website or blog at WordPress.com.