American Phil Assoc Blog: The Stat Crisis of Science: Where are the Philosophers?

Ship StatInfasST

The Statistical Crisis of Science: Where are the Philosophers?

This was published today on the American Philosophical Association blog

“[C]onfusion about the foundations of the subject is responsible, in my opinion, for much of the misuse of the statistics that one meets in fields of application such as medicine, psychology, sociology, economics, and so forth.” (George Barnard 1985, p. 2)

“Relevant clarifications of the nature and roles of statistical evidence in scientific research may well be achieved by bringing to bear in systematic concert the scholarly methods of statisticians, philosophers and historians of science, and substantive scientists…” (Allan Birnbaum 1972, p. 861).

“In the training program for PhD students, the relevant basic principles of philosophy of science, methodology, ethics and statistics that enable the responsible practice of science must be covered.” (p. 57, Committee Investigating fraudulent research practices of social psychologist Diederik Stapel)

I was the lone philosophical observer at a special meeting convened by the American Statistical Association (ASA) in 2015 to construct a non-technical document to guide users of statistical significance tests–one of the most common methods used to distinguish genuine effects from chance variability across a landscape of social, physical and biological sciences.

It was, by the ASA Director’s own description, “historical”, but it was also highly philosophical, and its ramifications are only now being discussed and debated. Today, introspection on statistical methods is rather common due to the “statistical crisis in science”. What is it? In a nutshell: high powered computer methods make it easy to arrive at impressive-looking ‘findings’ that too often disappear when others try to replicate them when hypotheses and data analysis protocols are required to be fixed in advance.

How should scientific integrity be restored? Experts do not agree and the disagreement is intertwined with fundamental disagreements regarding the nature, interpretation, and justification of methods and models used to learn from incomplete and uncertain data. Today’s reformers, fraudbusters, and replication researchers increasingly call for more self-critical scrutiny on philosophical foundations. Philosophers should take this seriously. While philosophers of science are interested in helping to clarify, if not also to resolve, matters of evidence and inference, they are rarely consulted in practice for this end. The assumptions behind today’s competing evidence reforms–issues of what I will call evidence-policy–are largely hidden to those outside the loop of the philosophical foundations of statistics and data analysis, or Phil Stat. This is a crucial obstacle to scrutinizing the consequences to science policy, clinical trials, personalized medicine, and across a wide landscape of Big Data modeling.

Statistics has a fascinating and colorful history of philosophical debate, marked by unusual heights of passion, personality, and controversy for at least a century. Wars between frequentists and Bayesians have been so contentious that everyone wants to believe we are long past them: we now have unifications and reconciliations, and practitioners only care about what works. The truth is that both brand new and long-standing battles simmer below the surface in questions about scientific trustworthiness. They show up unannounced in the current problems of scientific integrity, questionable research practices, and in the swirl of methodological reforms and guidelines that spin their way down from journals and reports, the ASA Statement being just one. There isn’t even agreement as to what is to be meant by the method “works”. These are key themes in my Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP).

Many of the key problems in today’s evidence-policy disputes inherit the conceptual confusions of the underlying methods for evidence and inference. They are intertwined with philosophical terms that often remain vague, such as inference, reliability, testing, rationality, explanation, induction, confirmation, and falsification. This hampers communication among various stakeholders, making it difficult to even recognize and articulate where they agree. The philosopher’s penchant for laying bare presuppositions of claims and arguments would let us cut through the unclarity that blocked the experts at the ASA meeting from clearly pinpointing where and why they agree or disagree. (As a mere “observer”, I rarely intervened.) We should put philosophy to work on the popular memes: “All models are false”, “Everything is equally subjective and objective”, “P -values exaggerate evidence”, and “ most published research findings are false”.

So am I calling on my fellow philosophers (at least some of them) to learn formal statistics? That would be both too much and too little. Too much because it would be impractical; too little because despite technical sophistication, basic concepts of statistical testing and inference are more unsettled than ever. Debates about P-values–whether to redefine them, lower them, or ban them altogether–are all the subject of heated discussion and journalistic debates. Megateams of seventy or more authors array themselves on either side of the debate (e.g., Benjamin 2017, Lakens 2018), including some philosophers (I was a co-author in Lakens, arguing that redefining significance would not help with the problem of replication). The deepest problems underlying the replication crisis go beyond formal statistics–into measurement, experimental design, communication of uncertainty. Yet these rarely occupy center stage in all the brouhaha. By focusing just on the formal statistical issues, the debates give short shrift to the need to tie formal methods to substantive inferences, to a general account of collecting and learning from data, and to entirely non-statistical types of inference. The goal becomes: who can claim to offer the highest proportion of “true” effects among those outputted by a formal method?

You might say my project is only relevant for philosophers of science, logic, formal epistemology and the like. While they are the obvious suspects, it goes further. Despite the burgeoning of discussions of ethics in research and in data science, the work is generally done by practitioners apart from philosophy, or by philosophers apart from the nitty-gritty details of the data sciences themselves. Without grasping the basic statistics, informed by understanding contrasting views of the nature and goals of using probability in learning, it’s impossible to see where the formal issues leave off and informal, value-laden issues arise or intersect. Philosophers in research ethics can wind up building arguments that forfeit a stronger stance that a critical assessment of the methods would afford (e.g., arguing for a precautionary stance, when there is evidence of genuine risk increase in the data, despite non-significant results.) Interest in experimental philosophy is another area that underscores the importance of a critical assessment of the statistical methods on which it is based. Formal methods, logic and probability are staples of philosophy, why not methods of inference based on probabilistic methods? That’s what statistics is.

Not only is PhilStat relevant to addressing some long-standing philosophical problems of evidence, inference and knowledge, it offers a superb avenue for philosophers to genuinely impact scientific practice and policy. Even a sufficient understanding of the inference methods together with a platform for raising questions about fallacies and pitfalls could be extremely effective. What is at stake is a critical standpoint that we may be in danger of losing. Without it, we forfeit the ability to communicate with, and hold accountable, the “experts,” the agencies, the quants, and all those data handlers increasingly exerting power over our lives. It goes beyond philosophical outreach–as important as that is–to becoming citizen scholars and citizen scientists.

I have been pondering how to overcome these obstacles, and am keen to engage fellow philosophers in the project. I am going to take one step toward exploring and meeting this goal, together with a colleague, Aris Spanos, in economics. We are running a two-week immersive seminar on PhilStat for philosophy faculty and post-docs who wish to acquire or strengthen their background in PhilStat as it relates to philosophical problems of evidence and inference, to today’s statistical crisis of replication, and to associated evidence-policy debates. The logistics are modeled on the NEH Summer Seminars for college faculty that I directed in 1999 (on Philosophy of Experiment: Induction, Reliability, and Error). The content reflects Mayo (2018), which is written as a series of Excursions and Tours in a “Philosophical Voyage” to illuminate statistical inference. Consider joining me. In the meantime, I would like to hear from philosophers interested or already involved in this arena. Do you have references to existing efforts in this direction? Please share them.

 

Barnard, G. (1985). A Coherent View of Statistical Inference, Statistics Technical Report Series. Department of Statistics & Actuarial Science, University of Waterloo, Canada.

Benjamin, D. et al (2017). Redefine Statistical Significance, Nature Human Behaviour 2, 6-10.

Birnbaum, A. (1972). More on concepts of statistical evidence. J. Amer. Statist. Assoc. 67 858–861. MR0365793

Lakens et al (2018). Justify Your Alpha Nature Human Behaviour 2, 168-71.

Levelt Committee, Noort Committee, Drenth Committee (2012). Flawed Science: The Fraudulent Research Practices of Social Psychologist Diederik Stapel (www.commissielevelt.nl/).

Mayo, D. (2018). Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (CUP). (The first chapter [Excursion 1 Tour I ] is here.)

Wasserstein & Lazar (2016). The ASA’s Statement on P-values: Context, Process and Purpose, (and supplemental materials), The American Statistician 70(2), 129–33.

 

Credit for the ‘statistical cruise ship’ artwork goes to Mickey Mayo of Mayo Studios, Inc.

 

Deborah Mayo is Professor Emerita in the Department of Philosophy at 

Virginia Tech. She’s the author of Error and the Growth of Experimental Knowledge (1996, Chicago), which won the 1998 Lakatos Prize awarded to the most outstanding contribution to the philosophy of science during the previous six years. She co-edited, with Aris Spanos, Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science(2010, CUP), and co-edited, with Rochelle Hollander, Acceptable Evidence: Science and Values in Risk Management (1991, Oxford). Other publications are available here.

many thanks to Nathan Oserloff for inviting me to submit this blogpost to the APA blog.
Categories: Error Statistics, Philosophy of Statistics, Summer Seminar in PhilStat

Post navigation

2 thoughts on “American Phil Assoc Blog: The Stat Crisis of Science: Where are the Philosophers?

  1. Christian Hennig

    Something to applaud!

  2. Thomas R. Dyckman

    I am an 87 year old emeritus professor who has spent his career in a graduate school of business. I have followed my colleagues efforts to enlighten their field through, in part, research of value to those practicing in what we call “the real world.” I have had some exposure to statistical approaches used by my colleagues and more recently by following your Blog. Even as a neophyte in the discipline, I see many limitations in how my colleagues pursue the statistical dragon to its questionable conclusion.
    I found your blog quite by accident but have profited from the exposure and effort to become at least a voice encouraging my colleagues to be more careful of what they assert to have verified. Thank you for that.
    In writing recently about some of the difficulties of venturing into the swamp, I was besieged by one reviewer who denied that anything related to statistic analysis could be learned from philosophy or even the practice of medicine among other fields. While I am far from mastering even the basics, I am now aware of the easily expressed overconfidence expressed by my colleagues application of statistical analysis. No doubt the situation exists elsewhere as well. As you observe, even those most knowledgeable are still debating fundamental issues.
    On a different note, my father in law, Dan Pletta, taught in the engineering school at VPI for forty years before he died in the 1997. I got to know Blacksburg quite well. It has certainly changed since I knew it in the 1950’ties. Best wishes and keep up the good work.
    No need to post this comment

Blog at WordPress.com.