Science isn’t about predicting one-off events like election results, but that doesn’t mean the way to make election forecasts scientific (which they should be) is to build “theories of voting.” A number of people have sent me articles on statistical aspects of the recent U.S. election, but I don’t have much to say and I like to keep my blog non-political. I won’t violate this rule in making a couple of comments on Faye Flam’s Nov. 11 article: “Why Science Couldn’t Predict a Trump Presidency”[i].
For many people, Donald Trump’s surprise election victory was a jolt to very idea that humans are rational creatures. It tore away the comfort of believing that science has rendered our world predictable. The upset led two New York Times reporters to question whether data science could be trusted in medicine and business. A Guardian columnist declared that big data works for physics but breaks down in the realm of human behavior. Continue reading
Gerd Gigerenzer, Andrew Gelman, Clark Glymour and I took part in a very interesting symposium on Philosophy of Statistics at the Philosophy of Science Association last Friday. I jotted down lots of notes, but I’ll limit myself to brief reflections and queries on a small portion of each presentation in turn, starting with Gigerenzer’s “Surrogate Science: How Fisher, Neyman-Pearson, & Bayes Were Transformed into the Null Ritual.” His complete slides are below my comments. I may write this in stages, this being (i).
- Good scientific practice–bold theories, double-blind experiments, minimizing measurement error, replication, etc.–became reduced in the social science to a surrogate: statistical significance.
I agree that “good scientific practice” isn’t some great big mystery, and that “bold theories, double-blind experiments, minimizing measurement error, replication, etc.” are central and interconnected keys to finding things out in error prone inquiry. Do the social sciences really teach that inquiry can be reduced to cookbook statistics? Or is it simply that, in some fields, carrying out surrogate science suffices to be a “success”? Continue reading
PSA 2016 Symposium:
Philosophy of Statistics in the Age of Big Data and Replication Crises
Friday November 4th 9-11:45 am (includes coffee break 10-10:15)
Location: Piedmont 4 (12th Floor) Westin Peachtree Plaza
- Deborah Mayo (Professor of Philosophy, Virginia Tech, Blacksburg, Virginia) “Controversy Over the Significance Test Controversy” (Abstract)
- Gerd Gigerenzer (Director of Max Planck Institute for Human Development, Berlin, Germany) “Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed into the Null Ritual” (Abstract)
- Andrew Gelman (Professor of Statistics & Political Science, Columbia University, New York) “Confirmationist and Falsificationist Paradigms in Statistical Practice” (Abstract)
- Clark Glymour (Alumni University Professor in Philosophy, Carnegie Mellon University, Pittsburgh, Pennsylvania) “Exploratory Research is More Reliable Than Confirmatory Research” (Abstract)
Key Words: big data, frequentist and Bayesian philosophies, history and philosophy of statistics, meta-research, p-values, replication, significance tests.
Science is undergoing a crisis over reliability and reproducibility. High-powered methods are prone to cherry-picking correlations, significance-seeking, and assorted modes of extraordinary rendition of data. The Big Data revolution may encourage a reliance on statistical methods without sufficient scrutiny of whether they are teaching us about causal processes of interest. Mounting failures of replication in the social and biological sciences have resulted in new institutes for meta-research, replication research, and widespread efforts to restore scientific integrity and transparency. Statistical significance test controversies, long raging in the social sciences, have spread to all fields using statistics. At the same time, foundational debates over frequentist and Bayesian methods have shifted in important ways that are often overlooked in the debates. The problems introduce philosophical and methodological questions about probabilistic tools, and science and pseudoscience—intertwined with technical statistics and the philosophy and history of statistics. Our symposium goal is to address foundational issues around which the current crisis in science revolves. We combine the insights of philosophers, psychologists, and statisticians whose work interrelates philosophy and history of statistics, data analysis and modeling. Continue reading
Formal Epistemology Workshop (FEW) 2017
Call for papers
Submission Deadline: December 1st, 2016
Authors Notified: February 8th, 2017
We invite papers in formal epistemology, broadly construed. FEW is an interdisciplinary conference, and so we welcome submissions from researchers in philosophy, statistics, economics, computer science, psychology, and mathematics.
Submissions should be prepared for blind review. Contributors ought to upload a full paper of no more than 6000 words and an abstract of up to 300 words to the Easychair website. Please submit your full paper in .pdf format. The deadline for submissions is December 1st, 2016. Authors will be notified on February 1st, 2017.
The final selection of the program will be made with an eye towards diversity. We especially encourage submissions from PhD candidates, early career researchers and members of groups that are underrepresented in philosophy. Continue reading
International Prize in Statistics Awarded to Sir David Cox for
Survival Analysis Model Applied in Medicine, Science, and Engineering
EMBARGOED until October 19, 2016, at 9 p.m. ET
ALEXANDRIA, VA (October 18, 2016) – Prominent British statistician Sir David Cox has been named the inaugural recipient of the International Prize in Statistics. Like the acclaimed Fields Medal, Abel Prize, Turing Award and Nobel Prize, the International Prize in Statistics is considered the highest honor in its field. It will be bestowed every other year to an individual or team for major achievements using statistics to advance science, technology and human welfare.
Cox is a giant in the field of statistics, but the International Prize in Statistics Foundation is recognizing him specifically for his 1972 paper in which he developed the proportional hazards model that today bears his name. The Cox Model is widely used in the analysis of survival data and enables researchers to more easily identify the risks of specific factors for mortality or other survival outcomes among groups of patients with disparate characteristics. From disease risk assessment and treatment evaluation to product liability, school dropout, reincarceration and AIDS surveillance systems, the Cox Model has been applied essentially in all fields of science, as well as in engineering. Continue reading
3 years ago…
MONTHLY MEMORY LANE: 3 years ago: October 2013. I mark in red three posts from each month that seem most apt for general background on key issues in this blog, excluding those reblogged recently, and in green up to 3 others I’d recommend. Posts that are part of a “unit” or a pair count as one.
- (10/3) Will the Real Junk Science Please Stand Up? (critical thinking)
- (10/5) Was Janina Hosiasson pulling Harold Jeffreys’ leg?
- (10/9) Bad statistics: crime or free speech (II)? Harkonen update: Phil Stat / Law /Stock
- (10/12) Sir David Cox: a comment on the post, “Was Hosiasson pulling Jeffreys’ leg?”(10/5 and 10/12 are a pair)
- (10/19) Blog Contents: September 2013
- (10/19) Bayesian Confirmation Philosophy and the Tacking Paradox (iv)*
- (10/25) Bayesian confirmation theory: example from last post…(10/19 and 10/25 are a pair)
- (10/26) Comedy hour at the Bayesian (epistemology) retreat: highly probable vs highly probed (vs what ?)
- (10/31) WHIPPING BOYS AND WITCH HUNTERS (interesting to see how things have changed and stayed the same over the past few years, share comments)
 Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.
 New Rule, July 30, 2016-very convenient.
Gelman and Loken (2014) recognize that even without explicit cherry picking there is often enough leeway in the “forking paths” between data and inference so that by artful choices you may be led to one inference, even though it also could have gone another way. In good sciences, measurement procedures should interlink with well-corroborated theories and offer a triangulation of checks– often missing in the types of experiments Gelman and Loken are on about. Stating a hypothesis in advance, far from protecting from the verification biases, can be the engine that enables data to be “constructed”to reach the desired end .
[E]ven in settings where a single analysis has been carried out on the given data, the issue of multiple comparisons emerges because different choices about combining variables, inclusion and exclusion of cases…..and many other steps in the analysis could well have occurred with different data (Gelman and Loken 2014, p. 464).
An idea growing out of this recognition is to imagine the results of applying the same statistical procedure, but with different choices at key discretionary junctures–giving rise to a multiverse analysis, rather than a single data set (Steegen, Tuerlinckx, Gelman, and Vanpaemel 2016). One lists the different choices thought to be plausible at each stage of data processing. The multiverse displays “which constellation of choices corresponds to which statistical results” (p. 797). The result of this exercise can, at times, mimic the delineation of possibilities in multiple testing and multiple modeling strategies. Continue reading
I haven’t been blogging that much lately, as I’m tethered to the task of finishing revisions on a book (on the philosophy of statistical inference!) But I noticed two interesting blogposts, one by Jeff Leek, another by Andrew Gelman, and even a related petition on Twitter, reflecting a newish front in the statistics wars: When it comes to improving scientific integrity, do we need more carrots or more sticks?
Leek’s post, from yesterday, called “Statistical Vitriol” (29 Sep 2016), calls for de-escalation of the consequences of statistical mistakes:
Over the last few months there has been a lot of vitriol around statistical ideas. First there were data parasites and then there were methodological terrorists. These epithets came from established scientists who have relatively little statistical training. There was the predictable backlash to these folks from their counterparties, typically statisticians or statistically trained folks who care about open source.
Scientific Misconduct and Scientific Expertise
1st Barcelona HPS workshop
November 11, 2016
Departament de Filosofia & Centre d’Història de la Ciència (CEHIC), Universitat Autònoma de Barcelona (UAB)
Location: CEHIC, Mòdul de Recerca C, Seminari L3-05, c/ de Can Magrans s/n, Campus de la UAB, 08193 Bellaterra (Barcelona)
Organized by Thomas Sturm & Agustí Nieto-Galan
Current science is full of uncertainties and risks that weaken the authority of experts. Moreover, sometimes scientists themselves act in ways that weaken their standing: they manipulate data, exaggerate research results, do not give credit where it is due, violate the norms for the acquisition of academic titles, or are unduly influenced by commercial and political interests. Such actions, of which there are numerous examples in past and present times, are widely conceived of as violating standards of good scientific practice. At the same time, while codes of scientific conduct have been developed in different fields, institutions, and countries, there is no universally agreed canon of them, nor is it clear that there should be one. The workshop aims to bring together historians and philosophers of science in order to discuss questions such as the following: What exactly is scientific misconduct? Under which circumstances are researchers more or less liable to misconduct? How far do cases of misconduct undermine scientific authority? How have standards or mechanisms to avoid misconduct, and to regain scientific authority, been developed? How should they be developed?
All welcome – but since space is limited, please register in advance. Write to: Thomas.Sturm@uab.cat
09:30 Welcome (Thomas Sturm & Agustí Nieto-Galan) Continue reading
G. A. Barnard: 23 Sept 1915-30 July, 2002
Today is George Barnard’s 101st birthday. In honor of this, I reblog an exchange between Barnard, Savage (and others) on likelihood vs probability. The exchange is from pp 79-84 (of what I call) “The Savage Forum” (Savage, 1962).[i] Six other posts on Barnard are linked below: 2 are guest posts (Senn, Spanos); the other 4 include a play (pertaining to our first meeting), and a letter he wrote to me.
BARNARD:…Professor Savage, as I understand him, said earlier that a difference between likelihoods and probabilities was that probabilities would normalize because they integrate to one, whereas likelihoods will not. Now probabilities integrate to one only if all possibilities are taken into account. This requires in its application to the probability of hypotheses that we should be in a position to enumerate all possible hypotheses which might explain a given set of data. Now I think it is just not true that we ever can enumerate all possible hypotheses. … If this is so we ought to allow that in addition to the hypotheses that we really consider we should allow something that we had not thought of yet, and of course as soon as we do this we lose the normalizing factor of the probability, and from that point of view probability has no advantage over likelihood. This is my general point, that I think while I agree with a lot of the technical points, I would prefer that this is talked about in terms of likelihood rather than probability. I should like to ask what Professor Savage thinks about that, whether he thinks that the necessity to enumerate hypotheses exhaustively, is important. Continue reading
Objectivity in statistics, as in science more generally, is a matter of both aims and methods. Objective science, in our view, aims to find out what is the case as regards aspects of the world [that hold] independently of our beliefs, biases and interests; thus objective methods aim for the critical control of inference and hypotheses, constraining them by evidence and checks of error. (Cox and Mayo 2010, p. 276)
I. The myth of objectivity. Whenever you come up against blanket slogans such as “no methods are objective” or “all methods are equally objective and subjective,” it is a good guess that the problem is being trivialized into oblivion. Yes, there are judgments, disagreements, and values in any human activity, which alone makes it too trivial an observation to distinguish among very different ways that threats of bias and unwarranted inferences may be controlled. Is the objectivity-subjectivity distinction really toothless as many will have you believe? I say no.
Cavalier attitudes toward objectivity are in tension with widely endorsed movements to promote replication, reproducibility, and to come clean on a number of sources behind illicit results: multiple testing, cherry picking, failed assumptions, researcher latitude, publication bias and so on. The moves to take back science–if they are not mere lip-service–are rooted in the supposition that we can more objectively scrutinize results,even if it’s only to point out those that are poorly tested. The fact that the term “objectivity” is used equivocally should not be taken as grounds to oust it, but rather to engage in the difficult work of identifying what there is in “objectivity” that we won’t give up, and shouldn’t. Continue reading
C. S. Peirce: 10 Sept, 1839-19 April, 1914
Today is C.S. Peirce’s birthday. He’s one of my all time heroes. You should read him: he’s a treasure chest on essentially any topic, and he anticipated several major ideas in statistics (e.g., randomization, confidence intervals) as well as in logic. I’ll reblog the first portion of a (2005) paper of mine. Links to Parts 2 and 3 are at the end. It’s written for a very general philosophical audience; the statistical parts are pretty informal. Happy birthday Peirce.
Peircean Induction and the Error-Correcting Thesis
Deborah G. Mayo
Transactions of the Charles S. Peirce Society: A Quarterly Journal in American Philosophy, Volume 41, Number 2, 2005, pp. 299-319
Peirce’s philosophy of inductive inference in science is based on the idea that what permits us to make progress in science, what allows our knowledge to grow, is the fact that science uses methods that are self-correcting or error-correcting:
Induction is the experimental testing of a theory. The justification of it is that, although the conclusion at any stage of the investigation may be more or less erroneous, yet the further application of the same method must correct the error. (5.145)
The consequences of recent criticisms of statistical tests have breathed brand new life into some very old howlers, many of which have been discussed on this blog. What is not funny, though, is how standard notions such as frequentist error probabilities are being redefined in the process, and how we now have arguments built on equivocations. In fact, there are official guidebooks for the statistically perplexed giving inconsistent definitions to the same term (See for just 1 of many examples this post). How much more perplexed will that leave us! Since it’s near the 5-year anniversary of this blog, let’s listen in to a new comedy hour mixing one from 3 years ago with some add-ons*.
Did you hear the one about the frequentist significance tester when he was shown the nonfrequentist nature of p-values?
Critic: I just simulated a long series of tests on a pool of null hypotheses, and I found that among tests with p-values of .05, at least 22%—and typically over 50%—of the null hypotheses are true!
Frequentist Significance Tester: Scratches head: But rejecting the null with a p-value of .05 ensures erroneous rejection no more than 5% of the time!
Raucous laughter ensues!
(Hah, hah… “So funny, I forgot to laugh! Or, I’m crying and laughing at the same time!) Continue reading
Prof. Larry Laudan
Lecturer in Law and Philosophy
University of Texas at Austin
“‘Not Guilty’: The Misleading Verdict and How It Fails to Serve either Society or the Innocent Defendant”
Most legal systems in the developed world share in common a two-tier verdict system: ‘guilty’ and ‘not guilty’. Typically, the standard for a judgment of guilty is set very high while the standard for a not-guilty verdict (if we can call it that) is quite low. That means any level of apparent guilt less than about 90% confidence that the defendant committed the crime leads to an acquittal (90% being the usual gloss on proof beyond a reasonable doubt, although few legal systems venture a definition of BARD that precise). According to conventional wisdom, the major reason for setting the standard as high as we do is the desire, even the moral necessity, to shield the innocent from false conviction. Continue reading
E.S.Pearson on a Gate, Mayo sketch
Here you see my scruffy sketch of Egon drawn 20 years ago for the frontispiece of my book, “Error and the Growth of Experimental Knowledge” (EGEK 1996). The caption is
“I might recall how certain early ideas came into my head as I sat on a gate overlooking an experimental blackcurrant plot… –E.S Pearson, “Statistical Concepts in Their Relation to Reality”.
He is responding to Fisher to “dispel the picture of the Russian technological bogey”. [i]
So, as I said in my last post, just to make a short story long, I’ve recently been scouring around the history and statistical philosophies of Neyman, Pearson and Fisher for purposes of a book soon to be completed, and I discovered a funny little error about this quote. Only maybe 3 or 4 people alive would care, but maybe someone out there knows the real truth.
OK, so I’d been rereading Constance Reid’s great biography of Neyman, and in one place she interviews Egon about the sources of inspiration for their work. Here’s what Egon tells her: Continue reading
E.S. Pearson (11 Aug, 1895-12 June, 1980)
This is a belated birthday post for E.S. Pearson (11 August 1895-12 June, 1980). It’s basically a post from 2012 which concerns an issue of interpretation (long-run performance vs probativeness) that’s badly confused these days. I’ve recently been scouring around the history and statistical philosophies of Neyman, Pearson and Fisher for purposes of a book soon to be completed. I recently discovered a little anecdote that calls for a correction in something I’ve been saying for years. While it’s little more than a point of trivia, it’s in relation to Pearson’s (1955) response to Fisher (1955)–the last entry in this post. I’ll wait until tomorrow or the next day to share it, to give you a chance to read the background.
Are methods based on error probabilities of use mainly to supply procedures which will not err too frequently in some long run? (performance). Or is it the other way round: that the control of long run error properties are of crucial importance for probing the causes of the data at hand? (probativeness). I say no to the former and yes to the latter. This, I think, was also the view of Egon Sharpe (E.S.) Pearson.
Cases of Type A and Type B
“How far then, can one go in giving precision to a philosophy of statistical inference?” (Pearson 1947, 172)
1. PhilSci and StatSci. I’m always glad to come across statistical practitioners who wax philosophical, particularly when Karl Popper is cited. Best of all is when they get the philosophy somewhere close to correct. So, I came across an article by Burnham and Anderson (2014) in Ecology:
“While the exact definition of the so-called ‘scientific method’ might be controversial, nearly everyone agrees that the concept of ‘falsifiability’ is a central tenant [sic] of empirical science (Popper 1959). It is critical to understand that historical statistical approaches (i.e., P values) leave no way to ‘test’ the alternative hypothesis. The alternative hypothesis is never tested, hence cannot be rejected or falsified!… Surely this fact alone makes the use of significance tests and P values bogus. Lacking a valid methodology to reject/falsify the alternative science hypotheses seems almost a scandal.” (Burnham and Anderson p. 629)
Well I am (almost) scandalized by this easily falsifiable allegation! I can’t think of a single “alternative”, whether in a “pure” Fisherian or a Neyman-Pearson hypothesis test (whether explicit or implicit) that’s not falsifiable; nor do the authors provide any. I grant that understanding testability and falsifiability is far more complex than the kind of popularized accounts we hear about; granted as well, theirs is just a short paper. But then why make bold declarations on the topic of the “scientific method and statistical science,” on falsifiability and testability? Continue reading
Head of Competence Center for Methodology and Statistics (CCMS)
Luxembourg Institute of Health
The tweet read “Featured review: Only 10% people with tension-type headaches get a benefit from paracetamol” and immediately I thought, ‘how would they know?’ and almost as quickly decided, ‘of course they don’t know, they just think they know’. Sure enough, on following up the link to the Cochrane Review in the tweet it turned out that, yet again, the deadly mix of dichotomies and numbers needed to treat had infected the brains of researchers to the extent that they imagined that they had identified personal response. (See Responder Despondency for a previous post on this subject.)
The bare facts they established are the following:
The International Headache Society recommends the outcome of being pain free two hours after taking a medicine. The outcome of being pain free or having only mild pain at two hours was reported by 59 in 100 people taking paracetamol 1000 mg, and in 49 out of 100 people taking placebo.
and the false conclusion they immediately asserted is the following
This means that only 10 in 100 or 10% of people benefited because of paracetamol 1000 mg.
To understand the fallacy, look at the accompanying graph. Continue reading