Philosophy of Science Association 2016 Symposium

screen-shot-2016-10-26-at-10-23-07-pmPSA 2016 Symposium:
Philosophy of Statistics in the Age of Big Data and Replication Crises
Friday November 4th  9-11:45 am
(includes coffee  break 10-10:15)
Location: Piedmont 4 (12th Floor) Westin Peachtree Plaza

  • Deborah Mayo (Professor of Philosophy, Virginia Tech, Blacksburg, Virginia) “Controversy Over the Significance Test Controversy” (Abstract)
  • Gerd Gigerenzer (Director of Max Planck Institute for Human Development, Berlin, Germany) “Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed into the Null Ritual” (Abstract)
  • Andrew Gelman (Professor of Statistics & Political Science, Columbia University, New York) “Confirmationist and Falsificationist Paradigms in Statistical Practice” (Abstract)
  • Clark Glymour (Alumni University Professor in Philosophy, Carnegie Mellon University, Pittsburgh, Pennsylvania) “Exploratory Research is More Reliable Than Confirmatory Research” (Abstract)

Key Words: big data, frequentist and Bayesian philosophies, history and philosophy of statistics, meta-research, p-values, replication, significance tests.


Science is undergoing a crisis over reliability and reproducibility. High-powered methods are prone to cherry-picking correlations, significance-seeking, and assorted modes of extraordinary rendition of data. The Big Data revolution may encourage a reliance on statistical methods without sufficient scrutiny of whether they are teaching us about causal processes of interest. Mounting failures of replication in the social and biological sciences have resulted in new institutes for meta-research, replication research, and widespread efforts to restore scientific integrity and transparency. Statistical significance test controversies, long raging in the social sciences, have spread to all fields using statistics. At the same time, foundational debates over frequentist and Bayesian methods have shifted in important ways that are often overlooked in the debates. The problems introduce philosophical and methodological questions about probabilistic tools, and science and pseudoscience—intertwined with technical statistics and the philosophy and history of statistics. Our symposium goal is to address foundational issues around which the current crisis in science revolves. We combine the insights of philosophers, psychologists, and statisticians whose work interrelates philosophy and history of statistics, data analysis and modeling.


Philosophy of statistics tackles conceptual and epistemological problems in using probabilistic methods to collect, model, analyze, and draw inferences from data. The problems concern the nature of uncertain evidence, the role and interpretation of probability, reliability, and robustness—all of which link to a long history of disputes of personality and philosophy between frequentists, Bayesians, and likelihoodists (e.g., Fisher, Neyman, Pearson, Jeffreys, Lindley, Savage). Replication failures have led researchers to reexamine their statistical methods. Although nowadays novel statistical techniques use simulations to detect cherry-picking and p-hacking, we see a striking recapitulation of Bayesian-frequentist debates of old. New philosophical issues arise from successes of machine learning and Big Data analysis: How do its predictions succeed when parameters in models are merely black boxes? One thing we learned in 2015 is why they fail: a tendency to overlook classic statistical issues– confounders, multiple testing, bias, model assumptions, and overfitting. The time is ripe for a forum that illuminates current developments and points to the directions of future work by philosophers and methodologists of science.

The New Statistical Significance Test Controversy. Mechanical, cookbook uses of statistical significance tests have long been lampooned in social sciences, but once high-placed failures revealed poor rates of replication in medicine and cancer research, the problem took on a new seriousness. Drawing on criticisms from social science, however, the new significance test controversy retains caricatures of a “hybrid” view of significance testing, common in psychology (Gigerenzer). Well-known criticisms—statistical significance is not substantive significance, p-values are invalidated by significance seeking and violated model assumptions—are based on uses of methods warned against by the founders of Fisherian and Neyman-Pearson (N-P) tests. A genuine experimental effect, Fisher insisted, cannot be based on a single, isolated significant result (a single low p-value); low p-values had to be generated in multiple settings. Yet sweeping criticisms and recommended changes of method are often based on the rates of false positives assuming a single, just-significant result, with biasing selection effects to boot!

Foundational controversies are tied up with Fisher’s bitter personal feuds with Neyman, and Neyman’s attempt to avoid inconsistencies in Fisher’s “fiducial” probability by means of confidence levels. Only a combined understanding of the early statistical and historical developments can get beyond the received views of the philosophical differences between Fisherian and N-P tests. People should look at the properties of the methods, independent of what the founders supposedly thought.

Bayesian-Frequentist Debates. The Bayesian-frequentist debates need to be revisited. Many discussants, who only a decade ago argued for the “irreconcilability” of frequentist p-values and Bayesian measures, now call for ways to reconcile the two. In today’s most popular Bayesian accounts, prior probabilities in hypotheses do not express degrees of belief but are given by various formal assignments or “defaults,” ideally with minimal impact on the posterior probability. Advocates of unifications are keen to show that Bayesian methods have good (frequentist) long-run performance; and that it is often possible to match frequentist and Bayesian quantities, despite differences in meaning and goals. Other Bayesians deny the idea that Bayesian updating fits anything they actually do in statistics (Gelman). Statistical methods are being decoupled from the philosophies in which they are traditionally couched, calling for new foundations and insights from philosophers.

Is the “Bayesian revolution,” like the significance test revolution before it, ushering in the latest in a universal method and surrogate science (Gigerenzer)? If the key problems of significance tests occur equally with Bayes ratios, confidence intervals and credible regions, then we need a new statistical philosophy to underwrite alternative, more self-critical methods (Mayo).

The Big Data Revolution. New data acquisition procedures in biology and neuroscience yield enormous quantities of high-dimensional data, which can only be analyzed by computerized search procedures. But the most commonly used search procedures have known liabilities and can often only be validated using computer simulations. Analyses used to find predictors in areas such as medical diagnostics are so new that often their statistical properties are unknown, making them ethically problematic. Questions about the very nature of replication and validation, and of reliability and robustness enter. Without a more critical analysis of the foibles, the current Human Connectome project to understand brain processes may result in the same disappointments as gene regulation discovery, with its so far unfulfilled promise of reliably predicting personalized cancer treatments (Glymour).[See new topic.]

The wealth of computational ability allows for the application of countless methods with little handwringing about foundations, but they introduce new quandaries. The techniques that Big Data requires to “clean” and process data introduce biases that are difficult to detect. Can sufficient data obviate the need to satisfy long-standing principles of experimental design? Can data-dependent simulations, resampling and black-box models ever count as valid replications or genuine model validations?

The Contributors: While participants represent diverse statistical philosophies, there is agreement that a central problem concerns the gaps between the outputs of formal statistical methods and research claims of interest. In addition to illuminating problems, each participant will argue for an improved methodology: an error statistical account of inference (Mayo), a heuristic toolbox (Gigerenzer), Bayesian falsification via predictive distributions (Gelman), and a distinct causal-modeling approach (Glymour).




Controversy Over the Significance Test Controversy
Deborah Mayo
(Professor of Philosophy, Virginia Tech, Blacksburg, Virginia)

In the face of misinterpretations and proposed bans of statistical significance tests, the American Statistical Association gathered leading statisticians in 2015 to articulate statistical fallacies and galvanize discussion of statistical principles. I discuss the philosophical assumptions lurking in the background of their recommendations, linking also my co-symposiasts. As is common, probability is assumed to accord with one of two statistical philosophies: (1) probabilism and (2) (long-run) performance. (1) assumes probability should supply degrees of confirmation, support or belief in hypotheses, e.g., Bayesian posteriors, likelihood ratios, and Bayes factors; (2) limits probability to long-run reliability in a series of applications, e.g., a “behavioristic” construal of N-P type 1 and 2 error probabilities; false discovery rates in Big Data.

Assuming probabilism, significance levels are relevant to a particular inference only if misinterpreted as posterior probabilities. Assuming performance, they are criticized as relevant only for quality control, and contexts of repeated applications. Performance is just what’s needed in Big Data searching through correlations (Glymour). But for inference, I sketch a third construal: (3) probativeness. In (2) and (3), unlike (1), probability attaches to methods (testing or estimation), not the hypotheses. These “methodological probabilities” report on a method’s ability to control the probability of erroneous interpretations of data: error probabilities. While significance levels (p-values) are error probabilities, the probing construal in (3) directs their evidentially relevant use.

That a null hypothesis of “no effect” or “no increased risk” is rejected at the .01 level (given adequate assumptions) tells us that 99% of the time, a smaller observed difference would result from expected variability, as under the null hypothesis. If such statistically significant effects are produced reliably, as Fisher required, they indicate a genuine effect. Looking at the entire p-value distribution under various discrepancies from the null allows inferring those that are well or poorly indicated. This is akin to confidence intervals but we do not fix a single confidence level, and we distinguish the warrant for different points in any interval. My construal connects to Birnbaum’s confidence concept, Popperian corroboration, and possibly Fisherian fiducial probability. The probativeness interpretation better meets the goals driving current statistical reforms.

Much handwringing stems from hunting for an impressive-looking effect, then inferring a statistically significant finding. The actual probability of erroneously finding significance with this gambit is not low, but high, so a reported small p-value is invalid. Flexible choices along “forking paths” from data to inference cause the same problem, even if the criticism is informal (Gelman). However, the same flexibility occurs with probabilist reforms, be they likelihood ratios, Bayes factors, highest probability density (HPD) intervals, or lowering the p-value (until the maximal likely alternative gets .95 posterior). But lost are the direct grounds to criticize them as flouting error statistical control. I concur with Gigerenzer’s criticisms of ritual uses of p-values, but without understanding their valid (if limited) role, there’s a danger of accepting reforms that throw out the error control baby with the “bad statistics” bathwater.



Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed into the Null Ritual
Gerd Gigerenzer

(Director of Max Planck Institute for Human Development, Berlin, Germany) 

If statisticians agree on one thing, it is that scientific inference should not be made mechanically. Despite virulent disagreements on other issues, Ronald Fisher and Jerzy Neyman, two of the most influential statisticians of the 20th century, were of one voice on this matter. Good science requires both statistical tools and informed judgment about what model to construct, what hypotheses to test, and what tools to use. Practicing statisticians rely on a “statistical toolbox” and on their expertise to select a proper tool. Social scientists, in contrast, tend to rely on a single tool.

In this talk, I trace the historical transformation of Fisher’s null hypothesis testing, Neyman-Pearson decision theory, and Bayesian statistics into a single mechanical procedure that is performed like compulsive hand washing: the null ritual. In the social sciences, this transformation has fundamentally changed research practice, making statistical inference its centerpiece. The essence of the null ritual is:

  1. Set up a null hypothesis of “no mean difference” or “zero correlation.” Do not specify the predictions of your own research hypothesis. 2. Use 5% as a convention for rejecting the null. If significant, accept your research hypothesis. Report the result as p < .05, p < .01, or
    p < .001, whichever comes next to the obtained p-value. 3. Always perform this procedure.

I use the term “ritual” because this procedure shares features that define social rituals: (i) the repetition of the same action, (ii) a focus on special numbers or colors, (iii) fears about serious sanctions for rule violations, and (iv) wishful thinking and delusions that virtually eliminate critical thinking. The null ritual has each of these four characteristics: mindless repetition; the magical 5% number, fear of sanctions by editors or advisors, and delusions about what a p-value means, which block researchers’ intelligence. Starting in the 1940s, writers of bestselling statistical textbooks for the social sciences have silently transformed rivaling statistical systems into an apparently monolithic method that could be used mechanically. The idol of a universal method for scientific inference has been worshipped and institutionalized since the “inference revolution” of the 1950s. Because no such method has ever been found, surrogates have been created, most notably the quest for significant p-values. I show that this form of surrogate science fosters delusions and argue that it is one of the reasons of “borderline cheating” which has done much harm, creating, for one, a flood of irreproducible results in fields such as psychology, cognitive neuroscience and tumor marker research.

Today, proponents of the “Bayesian revolution” are in a similar danger of chasing the same chimera: an apparently universal inference procedure. A better path would be to promote an understanding of the various devices in the “statistical toolbox.” I discuss possible explanations why a toolbox approach to statistics has been so far successfully prevented by journal editors, textbook writers, and social scientists.



Confirmationist and Falsificationist Paradigms in Statistical Practice
Andrew Gelman
(Professor of Statistics & Political Science, Columbia University, New York)

There is a divide in statistics between classical frequentist and Bayesian methods. Classical hypothesis testing is generally taken to follow a falsificationist, Popperian philosophy in which research hypotheses are put to the test and rejected when data do not accord with predictions.  Bayesian inference is generally taken to follow a confirmationist philosophy in which data are used to update the probabilities of different hypotheses. We disagree with this conventional Bayesian-frequentist contrast: We argue that classical null hypothesis significance testing is actually used in a confirmationist sense and in fact does not do what it purports to do; and we argue that Bayesian inference cannot in general supply reasonable probabilities of models being true. The standard research paradigm in social psychology (and elsewhere) seems to be that the researcher has a favorite hypothesis A. But, rather than trying to set up hypothesis A for falsification, the researcher picks a null hypothesis B to falsify, which is then taken as evidence in favor of A. Research projects are framed as quests for confirmation of a theory, and once confirmation is achieved, there is a tendency to declare victory and not think too hard about issues of reliability and validity of measurements.

Instead, we recommend a falsificationist Bayesian approach in which models are altered and rejected based on data. The conventional Bayesian confirmation view blinds many Bayesians to the benefits of predictive model checking. The view is that any Bayesian model necessarily represents a subjective prior distribution and as such could never be tested.  It is not only Bayesians who avoid model checking. Quantitative researchers in political science, economics, and sociology regularly fit elaborate models without even the thought of checking their fit.

We can perform a Bayesian test by first assuming the model is true, then obtaining the posterior distribution, and then determining the distribution of the test statistic under hypothetical replicated data under the fitted model. A posterior distribution is not the final end, but is part of the derived prediction for testing. In practice, we implement this sort of check via simulation.

Posterior predictive checks are disliked by some Bayesians because of their low power arising from their allegedly “using the data twice”. This is not a problem for us: it simply represents a dimension of the data that is virtually automatically fit by the model.

What can statistics learn from philosophy? Falsification and the notion of scientific revolutions can make us willing to check our model fit and to vigorously investigate anomalies rather than treat prediction as the only goal of statistics. What can the philosophy of science learn from statistical practice? The success of inference using elaborate models, full of assumptions that are certainly wrong, demonstrates the power of deductive inference, and posterior predictive checking demonstrates that ideas of falsification and error statistics can be applied in a fully Bayesian environment with informative likelihoods and prior distributions.

Glymour_2006_IMG_0965Exploratory Research is More Reliable Than Confirmatory Research
Clark Glymour
(Alumni University Professor in Philosophy, Carnegie Mellon University, Pittsburgh, Pennsylvania)

Ioannidis (2005) argued that most published research is false, and that “exploratory” research in which many hypotheses are assessed automatically is especially likely to produce false positive relations. Colquhoun (2014) with simulations estimates that 30 to 40% of positive results using the conventional .05 cutoff for rejection of a null hypothesis is false. Their explanation is that true relationships in a domain are rare and the selection of hypotheses to test is roughly independent of their truth, so most relationships tested will in fact be false. Conventional use of hypothesis tests, in other words, suffers from a base rate fallacy. I will show that the reverse is true for modern search methods for causal relations because: a. each hypothesis is tested or assessed multiple times; b. the methods are biased against positive results; c. systems in which true relationships are rare are an advantage for these methods.  I will substantiate the claim with both empirical data and with simulations of data from systems with a thousand to a million variables that result in fewer than 5% false positive relationships and in which 90% or more of the true relationships are recovered.

Categories: Announcement | Leave a comment

Formal Epistemology Workshop 2017: call for papers



Formal Epistemology Workshop (FEW) 2017

Home Call For Papers Schedule Venue Travel and Accommodations

Call for papers

Submission Deadline: December 1st, 2016
Authors Notified: February 8th, 2017

We invite papers in formal epistemology, broadly construed. FEW is an interdisciplinary conference, and so we welcome submissions from researchers in philosophy, statistics, economics, computer science, psychology, and mathematics.

Submissions should be prepared for blind review. Contributors ought to upload a full paper of no more than 6000 words and an abstract of up to 300 words to the Easychair website. Please submit your full paper in .pdf format. The deadline for submissions is December 1st, 2016. Authors will be notified on February 1st, 2017.

The final selection of the program will be made with an eye towards diversity. We especially encourage submissions from PhD candidates, early career researchers and members of groups that are underrepresented in philosophy.


If you have any questions, please email formalepistemologyworkshop2017[AT]gmail, with the appropriate suffix.

Local Organizing Committee

Scientific Committee

Lara Buchak (Berkeley)
Vincenzo Crupi (Turin)
Sujata Ghosh (ISI Chennai)
Simon Hutteger (Irvine)
Subhash Lele (Alberta)
Hanti Lin (UC Davis)
Anna Mahtani (LSE)
Daniel Singer (Penn)
Michael Titelbaum (Madison)
Kevin Zollman (Carnegie Mellon)
Catrin Campbell-Moore (Bristol)
Kenny Easwaran (Texas A&M)
Nina Gierasimczuk (DTU Compute)
Brian Kim (Oklahoma)
Fenrong Liu (Tsinghua)
Deborah Mayo (Virgina Tech)
Carlotta Pavese (Duke/Turin)
Sonja Smets (ILLC Amsterdam)
Gregory Wheeler (MCMP Munich)
Eleonora Cresto (Buenos Aires)
Paul Egre (Institut Jean-Nicod)
Leah Henderson (Groningen)
Karolina Krzyzanowska (MCMP Munich)
Yang Liu (Cambridge)
Cailin O’Connor (Irvine)
Lavinia Picollo (MCMP Munich)
Julia Staffel (WashU in St. Louis)
Sylvia Wenmackers (Leuven)
Categories: Announcement | Leave a comment

International Prize in Statistics Awarded to Sir David Cox




International Prize in Statistics Awarded to Sir David Cox for
Survival Analysis Model Applied in Medicine, Science, and Engineering

EMBARGOED until October 19, 2016, at 9 p.m. ET

ALEXANDRIA, VA (October 18, 2016) – Prominent British statistician Sir David Cox has been named the inaugural recipient of the International Prize in Statistics. Like the acclaimed Fields Medal, Abel Prize, Turing Award and Nobel Prize, the International Prize in Statistics is considered the highest honor in its field. It will be bestowed every other year to an individual or team for major achievements using statistics to advance science, technology and human welfare.

Cox is a giant in the field of statistics, but the International Prize in Statistics Foundation is recognizing him specifically for his 1972 paper in which he developed the proportional hazards model that today bears his name. The Cox Model is widely used in the analysis of survival data and enables researchers to more easily identify the risks of specific factors for mortality or other survival outcomes among groups of patients with disparate characteristics. From disease risk assessment and treatment evaluation to product liability, school dropout, reincarceration and AIDS surveillance systems, the Cox Model has been applied essentially in all fields of science, as well as in engineering.

“Professor Cox changed how we analyze and understand the effect of natural or human-induced risk factors on survival outcomes, paving the way for powerful scientific inquiry and discoveries that have impacted human health worldwide,” said Susan Ellenberg, chair of the International Prize in Statistics Foundation. “Use of the ‘Cox Model’ in the physical, medical, life, earth, social and other sciences, as well as engineering fields, has yielded more robust and detailed information that has helped researchers and policymakers address some of society’s most pressing challenges.” Successful application of the Cox Model has led to life-changing breakthroughs with far-reaching societal effects, some of which include the following:

  • Demonstrating that a major reduction in smoking-related cardiac deaths could be seen within just one year of smoking cessation, not 10 or more years as previously thought
  • Showing the mortality effects of particulate air pollution, a finding that has changed both industrial practices and air quality regulations worldwide
  • Identifying risk factors of coronary artery disease and analyzing treatments for lung cancer, cystic fibrosis, obesity, sleep apnea and septic shock

His mark on research is so great that his 1972 paper is one of the three most-cited papers in statistics and ranked 16th in Nature’s list of the top 100 most-cited papers of all time for all fields.

In 2010, Cox received the Copley Medal, the Royal Society’s highest award that has also been bestowed upon such other world-renowned scientists as Peter Higgs, Stephen Hawking, Albert Einstein, Francis Crick and Ronald Fisher. Knighted in 1985, Cox is a fellow of the Royal Society, an honorary fellow of the British Academy and a foreign associate of the U.S. National Academy of Sciences. He has served as president of the Bernoulli Society, Royal Statistical Society and International Statistical Institute.
Cox’s 50-year career included technical and research positions in the private and nonprofit sectors, as well as numerous academic appointments as professor or department chair at Birkbeck College, Imperial College of London, Nuffield College and Oxford University. He earned his PhD from the University of Leeds in 1949, after first studying mathematics at St. Johns College. Though he retired in 1994, Cox remains active in the profession in Oxford, England.

Cox considers himself to be a scientist who happens to specialize in the use of statistics, which is defined as the science of learning from data. A foundation of scientific inquiry, statistics is a critical component in the development of public policy and has played fundamental roles in vast areas of human development and scientific exploration.

Note to Editors: Digital footage of Susan Ellenberg, chair of the International Prize in Statistics Foundation, announcing the recipient will be distributed on October 20. Ellenberg and Ron Wasserstein, director of the International Prize in Statistics Foundation and executive director of the American Statistical Association, will be available for interviews that day.

Link to article:  press-release-international-prize-winner


About the International Prize in Statistics
The International Prize in Statistics recognizes a major achievement of an individual or team in the field of statistics and promotes understanding of the growing importance and diverse ways statistics, data analysis, probability and the understanding of uncertainty advance society, science, technology and human welfare. With a monetary award of $75,000, it is given every other year by the International Prize in Statistics Foundation, which is comprised of representatives from the American Statistical Association, International Biometric Society, Institute of Mathematical Statistics, International Statistical Institute and Royal Statistical Society. Recipients are chosen from a selection committee comprised of world-renowned academicians and researchers and officially presented with the award at the World Statistics Congress.

For more information:
Jill Talley
Public Relations Manager,
American Statistical Association
(703) 684-1221, ext. 1865

Categories: Announcement | 1 Comment

Announcement: Scientific Misconduct and Scientific Expertise

Scientific Misconduct and Scientific Expertise

1st Barcelona HPS workshop

November 11, 2016

Departament de Filosofia & Centre d’Història de la Ciència (CEHIC),  Universitat Autònoma de Barcelona (UAB)

Location: CEHIC, Mòdul de Recerca C, Seminari L3-05, c/ de Can Magrans s/n, Campus de la UAB, 08193 Bellaterra (Barcelona)

Organized by Thomas Sturm & Agustí Nieto-Galan

Current science is full of uncertainties and risks that weaken the authority of experts. Moreover, sometimes scientists themselves act in ways that weaken their standing: they manipulate data, exaggerate research results, do not give credit where it is due, violate the norms for the acquisition of academic titles, or are unduly influenced by commercial and political interests. Such actions, of which there are numerous examples in past and present times, are widely conceived of as violating standards of good scientific practice. At the same time, while codes of scientific conduct have been developed in different fields, institutions, and countries, there is no universally agreed canon of them, nor is it clear that there should be one. The workshop aims to bring together historians and philosophers of science in order to discuss questions such as the following: What exactly is scientific misconduct? Under which circumstances are researchers more or less liable to misconduct? How far do cases of misconduct undermine scientific authority? How have standards or mechanisms to avoid misconduct, and to regain scientific authority, been developed? How should they be developed?

All welcome – but since space is limited, please register in advance. Write to:

09:30 Welcome (Thomas Sturm & Agustí Nieto-Galan) Continue reading

Categories: Announcement, replication research | 7 Comments

Philosophy and History of Science Announcements



2016 UK-EU Foundations of Physics Conference

Start Date:16 July 2016

Categories: Announcement | Leave a comment

“Using PhilStat to Make Progress in the Replication Crisis in Psych” at Society for PhilSci in Practice (SPSP)

Screen Shot 2016-06-15 at 1.19.23 PMI’m giving a joint presentation with Caitlin Parker[1] on Friday (June 17) at the meeting of the Society for Philosophy of Science in Practice (SPSP): “Using Philosophy of Statistics to Make Progress in the Replication Crisis in Psychology” (Rowan University, Glassboro, N.J.)[2] The Society grew out of a felt need to break out of the sterile straightjacket wherein philosophy of science occurs divorced from practice. The topic of the relevance of PhilSci and PhilStat to Sci has often come up on this blog, so people might be interested in the SPSP mission statement below our abstract.

Using Philosophy of Statistics to Make Progress in the Replication Crisis in Psychology

Deborah Mayo Virginia Tech, Department of Philosophy United States
Caitlin Parker Virginia Tech, Department of Philosophy United States

Continue reading

Categories: Announcement, replication research, reproducibility | 8 Comments

My Popper Talk at LSE: The Statistical Replication Crisis: Paradoxes and Scapegoats

I’m giving a Popper talk at the London School of Economics next Tuesday (10 May). If you’re in the neighborhood, I hope you’ll stop by.

Popper talk May 10 location

A somewhat accurate blurb is here. I say “somewhat” because it doesn’t mention that I’ll talk a bit about the replication crisis in psychology, and the issues that crop up (or ought to) in connecting statistical results and the causal claim of interest.



Categories: Announcement | 6 Comments

Philosophy & Physical Computing Graduate Workshop at VT

A Graduate Summer Workshop at Virginia Tech (Poster)

Application deadline: May 8, 2016 


Think & Code VT

JULY 11-24, 2016 at Virginia Tech

Who should apply:

  • This workshop is open to graduate students in master’s or PhD programs in philosophy or the sciences, including computer science.

For additional information or to apply online, visit, or contact Dr. Benjamin Jantzen at

Categories: Announcement | Leave a comment

I’m speaking at Univ of Minnesota on Friday

I’ll be speaking at U of Minnesota tomorrow. I’m glad to see a group with interest in philosophical foundations of statistics as well as the foundations of experiment and measurement in psychology. I will post my slides afterwards. Come by if you’re in the neighborhood. 

University of Minnesota
“The ASA (2016) Statement on P-values and
How to Stop Refighting the Statistics Wars”

April 8, 2016 at 3:35 p.m.




Deborah G. Mayo
Department of Philosophy, Virginia Tech

The CLA Quantitative Methods
Collaboration Committee
Minnesota Center for Philosophy of Science

275 Nicholson Hall
216 Pillsbury Drive SE
University of Minnesota
Minneapolis MN


This will be a mixture of my current take on the “statistics wars” together with my reflections on the recent ASA document on P-values. I was invited over a year ago already by Niels Waller, a co-author of Paul Meehl. I’ll never forget when I was there in 1997: Paul Meehl was in the audience, waving my book in the air–EGEK (1996)–and smiling!

Categories: Announcement | 3 Comments

Winner of December Palindrome: Mike Jacovides

Mike Jacovides


Winner of the December 2015 Palindrome contest

Mike Jacovides: Associate Professor of Philosophy at Purdue University

Palindrome: Emo, notable Stacy began a memory by Rome. Manage by cats, Elba to Nome.

The requirement: A palindrome using “memory” or “memories” (and Elba, of course).

Book choice (out of 12 or more)Error and the Growth of Experimental Knowledge (D. Mayo 1996, Chicago)

Bio: Mike Jacovides is an Associate Professor of Philosophy at Purdue University. He’s just finishing a book whose title is constantly changing, but which may end up being called Locke’s Image of the World and the Scientific Revolution.

Statement: My interest in palindromes was sparked by my desire to learn more about the philosophy of statistics. The fact that you can learn about the philosophy of statistics by writing a palindrome seems like evidence that anything can cause anything, but maybe once I read the book, I’ll learn that it isn’t. I am glad that ‘emo, notable Stacy’ worked out, I have to say.

Congratulations Mike! I hope you’ll continue to pursue philosophy of statistics! We need much more of that. Good choice of book prize too. D. Mayo Continue reading

Categories: Announcement, Palindrome | 1 Comment

Preregistration Challenge: My email exchange



David Mellor, from the Center for Open Science, emailed me asking if I’d announce his Preregistration Challenge on my blog, and I’m glad to do so. You win $1,000 if your properly preregistered paper is published. The recent replication effort in psychology showed, despite the common refrain – “it’s too easy to get low P-values” – that in preregistered replication attempts it’s actually very difficult to get small P-values. (I call this the “paradox of replication”[1].) Here’s our e-mail exchange from this morning:

          Dear Deborah Mayod,

I’m reaching out to individuals who I think may be interested in our recently launched competition, the Preregistration Challenge ( Based on your blogging, I thought it could be of interest to you and to your readers.

In case you are unfamiliar with it, preregistration specifies in advance the precise study protocols and analytical decisions before data collection, in order to separate the hypothesis-generating exploratory work from the hypothesis testing confirmatory work. 

Though required by law in clinical trials, it is virtually unknown within the basic sciences. We are trying to encourage this new behavior by offering 1,000 researchers $1000 prizes for publishing the results of their preregistered work. 

Please let me know if this is something you would consider blogging about or sharing in other ways. I am happy to discuss further. 


David Mellor, PhD

Project Manager, Preregistration Challenge, Center for Open Science


Deborah Mayo To David:                                                                          10:33 AM (1 hour ago)

David: Yes I’m familiar with it, and I hope that it encourages people to avoid data-dependent determinations that bias results. It shows the importance of statistical accounts that can pick up on such biasing selection effects. On the other hand, coupling prereg with some of the flexible inference accounts now in use won’t really help. Moreover, there may, in some fields, be a tendency to research a non-novel, fairly trivial result.

And if they’re going to preregister, why not go blind as well?  Will they?


Mayo Continue reading

Categories: Announcement, preregistration, Statistical fraudbusting, Statistics | 11 Comments

“Frequentist Accuracy of Bayesian Estimates” (Efron Webinar announcement)


Brad Efron

The Royal Statistical Society sent me a letter announcing their latest Journal webinar next Wednesday 21 October:

…RSS Journal webinar on 21st October featuring Bradley Efron, Andrew Gelman and Peter Diggle. They will be in discussion about Bradley Efron’s recently published paper titled ‘Frequentist accuracy of Bayesian estimates’. The paper was published in June in the Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol 77 (3), 617-646.  It is free to access from October 7th to November 4th.

Webinar start time: 8 am in California (PDT); 11 am in New York (EDT); 4pm (UK time).

During the webinar, Bradley Efron will present his paper for about 30 minutes followed by a Q&A session with the audience. Andrew Gelman is joining us as discussant and the event will be chaired by our President, Peter Diggle. Participation in the Q&A session by anyone who dials in is warmly welcomed and actively encouraged.Participants can ask the author a question over the phone or simply issue a message using the web based teleconference system.  Questions can be emailed in advance and further information can be requested from

More details about this journal webinar and how to join can be found in StatsLife and on the RSS website.  RSS Journal webinars are sponsored by Quintiles.

We’d be delighted if you were able to join us on the 21st and very grateful if you could let your colleagues and students know about the event.

I will definitely be tuning in!

Categories: Announcement, Statistics | 6 Comments

Workshop on Replication in the Sciences: Society for Philosophy and Psychology: (2nd part of double header)

brain-quadrants2nd part of the double header:

Society for Philosophy and Psychology (SPP): 41st Annual meeting

SPP 2015 Program

Wednesday, June 3rd
1:30-6:30: Preconference Workshop on Replication in the Sciences, organized by Edouard Machery

1:30-2:15: Edouard Machery (Pitt)
2:15-3:15: Andrew Gelman (Columbia, Statistics, via video link)
3:15-4:15: Deborah Mayo (Virginia Tech, Philosophy)
4:15-4:30: Break
4:30-5:30: Uri Simonshon (Penn, Psychology)
5:30-6:30: Tal Yarkoni (University of Texas, Neuroscience)

 SPP meeting: 4-6 June 2015 at Duke University in Durham, North Carolina


First part of the double header:

The Philosophy of Statistics: Bayesianism, Frequentism and the Nature of Inference, 2015 APS Annual Convention Saturday, May 23  2:00 PM- 3:50 PM in Wilder (Marriott Marquis 1535 B’way)aps_2015_logo_cropped-1

Andrew Gelman
Stephen Senn
Deborah Mayo
Richard Morey, Session Chair & Discussant

taxi: VA-NYC-NC

 See earlier post for Frank Sinatra and more details
Categories: Announcement, reproducibility | Leave a comment

Philosophy of Statistics Comes to the Big Apple! APS 2015 Annual Convention — NYC

Start Spreading the News…..



 The Philosophy of Statistics: Bayesianism, Frequentism and the Nature of Inference,
2015 APS Annual Convention
Saturday, May 23  
2:00 PM- 3:50 PM in Wilder

(Marriott Marquis 1535 B’way)





Andrew Gelman

Professor of Statistics & Political Science
Columbia University



Stephen Senn

Head of Competence Center
for Methodology and Statistics (CCMS)

Luxembourg Institute of Health



D. Mayo headshot

D.G. Mayo, Philosopher



Richard Morey, Session Chair & Discussant

Senior Lecturer
School of Psychology
Cardiff University
Categories: Announcement, Bayesian/frequentist, Statistics | 8 Comments

Announcing Kent Staley’s new book, An Introduction to the Philosophy of Science (CUP)


Kent Staley has written a clear and engaging introduction to PhilSci that manages to blend the central key topics of philosophy of science with current philosophy of statistics. Quite possibly, Staley explains Error Statistics more clearly in many ways than I do in his 10 page section, 9.4. CONGRATULATIONS STALEY*

You can get this book for free by merely writing one of the simpler palindrome’s in the December contest.

Here’s an excerpt from that section:



9.4 Error-statistical philosophy of science and severe testing

Deborah Mayo has developed an alternative approach to the interpretation of frequentist statistical inference (Mayo 1996). But the idea at the heart of Mayo’s approach is one that can be stated without invoking probability at all. ….

Mayo takes the following “minimal scientific principle for evidence” to be uncontroversial:

Principle 3 (Minimal principle for evidence) Data xo provide poor evidence for H if they result from a method or procedure that has little or no ability of finding flaws in H, even if H is false.(Mayo and Spanos, 2009, 3) Continue reading

Categories: Announcement, Palindrome, Statistics, StatSci meets PhilSci | Tags: | 10 Comments

My Rutgers Seminar: tomorrow, December 3, on philosophy of statistics

picture-216-1I’ll be talking about philosophy of statistics tomorrow afternoon at Rutgers University, in the Statistics and Biostatistics Department, if you happen to be in the vicinity and are interested.


Seminar Speaker:     Professor Deborah Mayo, Virginia Tech

Title:           Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance

Time:          3:20 – 4:20pm, Wednesday, December 3, 2014 Place:         552 Hill Center


Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance Getting beyond today’s most pressing controversies revolving around statistical methods, I argue, requires scrutinizing their underlying statistical philosophies.Two main philosophies about the roles of probability in statistical inference are probabilism and performance (in the long-run). The first assumes that we need a method of assigning probabilities to hypotheses; the second assumes that the main function of statistical method is to control long-run performance. I offer a third goal: controlling and evaluating the probativeness of methods. An inductive inference, in this conception, takes the form of inferring hypotheses to the extent that they have been well or severely tested. A report of poorly tested claims must also be part of an adequate inference. I develop a statistical philosophy in which error probabilities of methods may be used to evaluate and control the stringency or severity of tests. I then show how the “severe testing” philosophy clarifies and avoids familiar criticisms and abuses of significance tests and cognate methods (e.g., confidence intervals). Severity may be threatened in three main ways: fallacies of statistical tests, unwarranted links between statistical and substantive claims, and violations of model assumptions.

Categories: Announcement, Statistics | 4 Comments

September 2014: Blog Contents

metablog old fashion typewriterSeptember 2014: Error Statistics Philosophy
Blog Table of Contents 

Compiled by Jean A. Miller

  • (9/30) Letter from George (Barnard)
  • (9/27) Should a “Fictionfactory” peepshow be barred from a festival on “Truth and Reality”? Diederik Stapel says no (rejected post)
  • (9/23) G.A. Barnard: The Bayesian “catch-all” factor: probability vs likelihood
  • (9/21) Statistical Theater of the Absurd: “Stat on a Hot Tin Roof”
  • (9/18) Uncle Sam wants YOU to help with scientific reproducibility!
  • (9/15) A crucial missing piece in the Pistorius trial? (2): my answer (Rejected Post)
  • (9/12) “The Supernal Powers Withhold Their Hands And Let Me Alone”: C.S. Peirce
  • (9/6) Statistical Science: The Likelihood Principle issue is out…!
  • (9/4) All She Wrote (so far): Error Statistics Philosophy Contents-3 years on
  • (9/3) 3 in blog years: Sept 3 is 3rd anniversary of





Categories: Announcement, blog contents, Statistics | Leave a comment

Uncle Sam wants YOU to help with scientific reproducibility!

You still have a few days to respond to the call of your country to solve problems of scientific reproducibility!

The following passages come from Retraction Watch, with my own recommendations at the end.

“White House takes notice of reproducibility in science, and wants your opinion”

ostpThe White House’s Office of Science and Technology Policy (OSTP) is taking a look at innovation and scientific research, and issues of reproducibility have made it onto its radar.

Here’s the description of the project from the Federal Register:

The Office of Science and Technology Policy and the National Economic Council request public comments to provide input into an upcoming update of the Strategy for American Innovation, which helps to guide the Administration’s efforts to promote lasting economic growth and competitiveness through policies that support transformative American innovation in products, processes, and services and spur new fundamental discoveries that in the long run lead to growing economic prosperity and rising living standards.

I wonder what Steven Pinker would say about some of the above verbiage?

And here’s what’s catching the eye of people interested in scientific reproducibility:

(11) Given recent evidence of the irreproducibility of a surprising number of published scientific findings, how can the Federal Government leverage its role as a significant funder of scientific research to most effectively address the problem?

The OSTP is the same office that, in 2013, took what Nature called “a long-awaited leap forward for open access” when it said “that publications from taxpayer-funded research should be made free to read after a year’s delay.That OSTP memo came after more than 65,000 people “signed a We the People petition asking for expanded public access to the results of taxpayer-funded research.”

Have ideas on improving reproducibility? Emails to are preferred, according to the notice, which also explains how to fax or mail comments. The deadline is September 23.

Off the top of my head, how about:

Promote the use of methodologies that:

  • control and assess the capabilities of methods to avoid mistaken inferences from data;
  • require demonstrated self-criticism all the way from the data collection, modelling and interpretation (statistical and substantive);
  • describe what is especially shaky or poorly probed thus far (and spell out how subsequent studies are most likely to locate those flaws)[i]

Institute penalties for QRPs and fraud?

Please offer your suggestions in the comments, or directly to Uncle Sam.

 [i]It may require a certain courage on the part of researchers, journalists, referees.

Categories: Announcement, reproducibility | 18 Comments

3 in blog years: Sept 3 is 3rd anniversary of

Where did you hear this?  “Join me, if you will, for a little deep-water drilling, as I cast about on my isle of Elba.” Remember this and this? And this philosophical treatise on “moving blog day”? Oy, did I really write all this stuff?

cake baked by blog staff for 3 year anniversary of

I still see this as my rag-tag amateur blog. I never learned html and don’t have time to now. But the blog enterprise was more jocund and easy-going then–just an experiment, really, and a place to discuss our RMM papers. (And, of course, a home for error statistical philosophers-in-exile).

A blog table of contents for all three years will appear tomorrow.

Anyway, 2 representatives from Elba flew into NYC and  baked this cake in my never-used Chef’s oven (based on the cover/table of contents of EGEK 1996). We’ll be celebrating at A Different Place tonight[i]–so if you’re in the neighborhood, stop by after 8pm for an Elba Grease (on me).

Do you want a free signed copy of EGEK? Say why in 25 words or less (to, and the Fund for E.R.R.O.R.* will send them to the top 3 submissions (by 9/10/14).**

Acknowledgments: I want to thank the many commentators for their frequent insights and for keeping things interesting and lively. Among the regulars, and semi-regulars (but with impact) off the top of my head, and in no order: Senn, Yanofsky, Byrd, Gelman, Schachtman, Kepler, McKinney, S. Young, Matloff, O’Rourke, Gandenberger, Wasserman, E. Berk, Spanos, Glymour, Rohde, Greenland, Omaclaren,someone named Mark, assorted guests, original guests, and anons, and mysterious visitors, related twitterers (who would rather tweet from afar). I’m sure I’ve left some people out. Thanks to students and participants in the spring 2014 seminar with Aris Spanos (slides and lecture notes are still up).

I’m especially grateful to my regular guest bloggers: Stephen Senn and Aris Spanos, and to those who were subjected to deconstructions and to U-Phils in years past. (I may return to that some time.) Other guest posters for 2014 will be acknowledged in the year round up.

I thank blog compilers, Jean Miler and Nicole Jinn, and give special thanks for the tireless efforts of Jean Miller who has slogged through html, or whatever it is, when necessary, has scanned and put up dozens of articles to make them easy for readers to access, taken slow ferries back and forth to the island of Elba, and fixed gazillions of glitches on a daily basis. Last, but not least, to the palindromists who have been winning lots of books recently (1 day left for August submissions).

*Experimental Reasoning, Reliability, Objectivity and Rationality.

** Accompany submissions with an e-mail address and regular address. All submissions remain private. Elba judges decisions are final. Void in any places where prohibited by laws, be they laws of likelihood or Napoleanic laws-in-exile. But seriously, we’re giving away 3 books.

[i]email for directions.

Categories: Announcement, Statistics | 12 Comments

Blogging Boston JSM2014?



I’m not there. (Several people have asked, I guess because I blogged JSM13.) If you hear of talks (or anecdotes) of interest to error, please comment here (or twitter: @learnfromerror)

Categories: Announcement | 7 Comments

Create a free website or blog at