Placebos: it’s not only the patients that are fooled
Head of Competence Center for Methodology and Statistics (CCMS)
Luxembourg Institute of Health
In my opinion a great deal of ink is wasted to little purpose in discussing placebos in clinical trials. Many commentators simply do not understand the nature and purpose of placebos. To start with the latter, their only purpose is to permit blinding of treatments and, to continue to the former, this implies that their nature is that they are specific to the treatment studied.
Consider an example. Suppose that Pannostrum Pharmaceuticals wishes to prove that its new treatment for migraine, Paineaze® (which is in the form of a small red circular pill) is superior to the market-leader offered by Allexir Laboratories, Kalmer® (which is a large purple lozenge). Pannostrum decides to do a head-to head comparison and of course, therefore will require placebos. Every patient will have to take a red pill and a purple lozenge. In the Paineaze arm what is red will be Paineaze and what is purple ‘placebo to Kalmer’. In the Kalmer arm what is red will be ‘placebo to Paineaze’ and what is purple will be Kalmer.
I came across a paper, “Tests of Statistical Significance Made Sound,” by Brian Haig, a psychology professor at the University of Canterbury, New Zealand. It hits most of the high notes regarding statistical significance tests, their history & philosophy and, refreshingly, is in the error statistical spirit! I’m pasting excerpts from his discussion of “The Error-Statistical Perspective”starting on p.7.
The Error-Statistical Perspective
An important part of scientific research involves processes of detecting, correcting, and controlling for error, and mathematical statistics is one branch of methodology that helps scientists do this. In recognition of this fact, the philosopher of statistics and science, Deborah Mayo (e.g., Mayo, 1996), in collaboration with the econometrician, Aris Spanos (e.g., Mayo & Spanos, 2010, 2011), has systematically developed, and argued in favor of, an error-statistical philosophy for understanding experimental reasoning in science. Importantly, this philosophy permits, indeed encourages, the local use of ToSS, among other methods, to manage error. Continue reading
3 years ago…
MONTHLY MEMORY LANE: 3 years ago: November 2013. I mark in red three posts from each month that seem most apt for general background on key issues in this blog, excluding those reblogged recently, and in green up to 3 others I’d recommend. Posts that are part of a “unit” or a group count as one. Here I’m counting 11/9, 11/13, and 11/16 as one
- (11/2) Oxford Gaol: Statistical Bogeymen
- (11/4) Forthcoming paper on the strong likelihood principle
- (11/9) Null Effects and Replication (cartoon pic)
- (11/9) Beware of questionable front page articles warning you to beware of questionable front page articles (iii)
- (11/13) T. Kepler: “Trouble with ‘Trouble at the Lab’?” (guest post)
- (11/16) PhilStock: No-pain bull
- (11/16) S. Stanley Young: More Trouble with ‘Trouble in the Lab’ (Guest post)
- (11/18) Lucien Le Cam: “The Bayesians hold the Magic”
- (11/20) Erich Lehmann: Statistician and Poet
- (11/23) Probability that it is a statistical fluke [i]
- (11/27) “The probability that it be a statistical fluke” [iia]
- (11/30) Saturday night comedy at the “Bayesian Boy” diary (rejected post*)
 Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.
 New Rule, July 30, 2016-very convenient.
I resume my comments on the contributions to our symposium on Philosophy of Statistics at the Philosophy of Science Association. My earlier comment was on Gerd Gigerenzer’s talk. I move on to Clark Glymour’s “Exploratory Research Is More Reliable Than Confirmatory Research.” His complete slides are after my comments.
GLYMOUR’S ARGUMENT (in a nutshell):
“The anti-exploration argument has everything backwards,” says Glymour (slide #11). While John Ioannidis maintains that “Research findings are more likely true in confirmatory designs,” the opposite is so, according to Glymour. (Ioannidis 2005, Glymour’s slide #6). Why? To answer this he describes an exploratory research account for causal search that he has been developing:
What’s confirmatory research for Glymour? It’s moving directly from rejecting a null hypothesis with a low P-value to inferring a causal claim. Continue reading
Science isn’t about predicting one-off events like election results, but that doesn’t mean the way to make election forecasts scientific (which they should be) is to build “theories of voting.” A number of people have sent me articles on statistical aspects of the recent U.S. election, but I don’t have much to say and I like to keep my blog non-political. I won’t violate this rule in making a couple of comments on Faye Flam’s Nov. 11 article: “Why Science Couldn’t Predict a Trump Presidency”[i].
For many people, Donald Trump’s surprise election victory was a jolt to very idea that humans are rational creatures. It tore away the comfort of believing that science has rendered our world predictable. The upset led two New York Times reporters to question whether data science could be trusted in medicine and business. A Guardian columnist declared that big data works for physics but breaks down in the realm of human behavior. Continue reading
Gerd Gigerenzer, Andrew Gelman, Clark Glymour and I took part in a very interesting symposium on Philosophy of Statistics at the Philosophy of Science Association last Friday. I jotted down lots of notes, but I’ll limit myself to brief reflections and queries on a small portion of each presentation in turn, starting with Gigerenzer’s “Surrogate Science: How Fisher, Neyman-Pearson, & Bayes Were Transformed into the Null Ritual.” His complete slides are below my comments. I may write this in stages, this being (i).
- Good scientific practice–bold theories, double-blind experiments, minimizing measurement error, replication, etc.–became reduced in the social science to a surrogate: statistical significance.
I agree that “good scientific practice” isn’t some great big mystery, and that “bold theories, double-blind experiments, minimizing measurement error, replication, etc.” are central and interconnected keys to finding things out in error prone inquiry. Do the social sciences really teach that inquiry can be reduced to cookbook statistics? Or is it simply that, in some fields, carrying out surrogate science suffices to be a “success”? Continue reading
PSA 2016 Symposium:
Philosophy of Statistics in the Age of Big Data and Replication Crises
Friday November 4th 9-11:45 am (includes coffee break 10-10:15)
Location: Piedmont 4 (12th Floor) Westin Peachtree Plaza
- Deborah Mayo (Professor of Philosophy, Virginia Tech, Blacksburg, Virginia) “Controversy Over the Significance Test Controversy” (Abstract)
- Gerd Gigerenzer (Director of Max Planck Institute for Human Development, Berlin, Germany) “Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed into the Null Ritual” (Abstract)
- Andrew Gelman (Professor of Statistics & Political Science, Columbia University, New York) “Confirmationist and Falsificationist Paradigms in Statistical Practice” (Abstract)
- Clark Glymour (Alumni University Professor in Philosophy, Carnegie Mellon University, Pittsburgh, Pennsylvania) “Exploratory Research is More Reliable Than Confirmatory Research” (Abstract)
Key Words: big data, frequentist and Bayesian philosophies, history and philosophy of statistics, meta-research, p-values, replication, significance tests.
Science is undergoing a crisis over reliability and reproducibility. High-powered methods are prone to cherry-picking correlations, significance-seeking, and assorted modes of extraordinary rendition of data. The Big Data revolution may encourage a reliance on statistical methods without sufficient scrutiny of whether they are teaching us about causal processes of interest. Mounting failures of replication in the social and biological sciences have resulted in new institutes for meta-research, replication research, and widespread efforts to restore scientific integrity and transparency. Statistical significance test controversies, long raging in the social sciences, have spread to all fields using statistics. At the same time, foundational debates over frequentist and Bayesian methods have shifted in important ways that are often overlooked in the debates. The problems introduce philosophical and methodological questions about probabilistic tools, and science and pseudoscience—intertwined with technical statistics and the philosophy and history of statistics. Our symposium goal is to address foundational issues around which the current crisis in science revolves. We combine the insights of philosophers, psychologists, and statisticians whose work interrelates philosophy and history of statistics, data analysis and modeling. Continue reading
Formal Epistemology Workshop (FEW) 2017
Call for papers
Submission Deadline: December 1st, 2016
Authors Notified: February 8th, 2017
We invite papers in formal epistemology, broadly construed. FEW is an interdisciplinary conference, and so we welcome submissions from researchers in philosophy, statistics, economics, computer science, psychology, and mathematics.
Submissions should be prepared for blind review. Contributors ought to upload a full paper of no more than 6000 words and an abstract of up to 300 words to the Easychair website. Please submit your full paper in .pdf format. The deadline for submissions is December 1st, 2016. Authors will be notified on February 1st, 2017.
The final selection of the program will be made with an eye towards diversity. We especially encourage submissions from PhD candidates, early career researchers and members of groups that are underrepresented in philosophy. Continue reading
International Prize in Statistics Awarded to Sir David Cox for
Survival Analysis Model Applied in Medicine, Science, and Engineering
EMBARGOED until October 19, 2016, at 9 p.m. ET
ALEXANDRIA, VA (October 18, 2016) – Prominent British statistician Sir David Cox has been named the inaugural recipient of the International Prize in Statistics. Like the acclaimed Fields Medal, Abel Prize, Turing Award and Nobel Prize, the International Prize in Statistics is considered the highest honor in its field. It will be bestowed every other year to an individual or team for major achievements using statistics to advance science, technology and human welfare.
Cox is a giant in the field of statistics, but the International Prize in Statistics Foundation is recognizing him specifically for his 1972 paper in which he developed the proportional hazards model that today bears his name. The Cox Model is widely used in the analysis of survival data and enables researchers to more easily identify the risks of specific factors for mortality or other survival outcomes among groups of patients with disparate characteristics. From disease risk assessment and treatment evaluation to product liability, school dropout, reincarceration and AIDS surveillance systems, the Cox Model has been applied essentially in all fields of science, as well as in engineering. Continue reading
3 years ago…
MONTHLY MEMORY LANE: 3 years ago: October 2013. I mark in red three posts from each month that seem most apt for general background on key issues in this blog, excluding those reblogged recently, and in green up to 3 others I’d recommend. Posts that are part of a “unit” or a pair count as one.
- (10/3) Will the Real Junk Science Please Stand Up? (critical thinking)
- (10/5) Was Janina Hosiasson pulling Harold Jeffreys’ leg?
- (10/9) Bad statistics: crime or free speech (II)? Harkonen update: Phil Stat / Law /Stock
- (10/12) Sir David Cox: a comment on the post, “Was Hosiasson pulling Jeffreys’ leg?”(10/5 and 10/12 are a pair)
- (10/19) Blog Contents: September 2013
- (10/19) Bayesian Confirmation Philosophy and the Tacking Paradox (iv)*
- (10/25) Bayesian confirmation theory: example from last post…(10/19 and 10/25 are a pair)
- (10/26) Comedy hour at the Bayesian (epistemology) retreat: highly probable vs highly probed (vs what ?)
- (10/31) WHIPPING BOYS AND WITCH HUNTERS (interesting to see how things have changed and stayed the same over the past few years, share comments)
 Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.
 New Rule, July 30, 2016-very convenient.
Gelman and Loken (2014) recognize that even without explicit cherry picking there is often enough leeway in the “forking paths” between data and inference so that by artful choices you may be led to one inference, even though it also could have gone another way. In good sciences, measurement procedures should interlink with well-corroborated theories and offer a triangulation of checks– often missing in the types of experiments Gelman and Loken are on about. Stating a hypothesis in advance, far from protecting from the verification biases, can be the engine that enables data to be “constructed”to reach the desired end .
[E]ven in settings where a single analysis has been carried out on the given data, the issue of multiple comparisons emerges because different choices about combining variables, inclusion and exclusion of cases…..and many other steps in the analysis could well have occurred with different data (Gelman and Loken 2014, p. 464).
An idea growing out of this recognition is to imagine the results of applying the same statistical procedure, but with different choices at key discretionary junctures–giving rise to a multiverse analysis, rather than a single data set (Steegen, Tuerlinckx, Gelman, and Vanpaemel 2016). One lists the different choices thought to be plausible at each stage of data processing. The multiverse displays “which constellation of choices corresponds to which statistical results” (p. 797). The result of this exercise can, at times, mimic the delineation of possibilities in multiple testing and multiple modeling strategies. Continue reading
I haven’t been blogging that much lately, as I’m tethered to the task of finishing revisions on a book (on the philosophy of statistical inference!) But I noticed two interesting blogposts, one by Jeff Leek, another by Andrew Gelman, and even a related petition on Twitter, reflecting a newish front in the statistics wars: When it comes to improving scientific integrity, do we need more carrots or more sticks?
Leek’s post, from yesterday, called “Statistical Vitriol” (29 Sep 2016), calls for de-escalation of the consequences of statistical mistakes:
Over the last few months there has been a lot of vitriol around statistical ideas. First there were data parasites and then there were methodological terrorists. These epithets came from established scientists who have relatively little statistical training. There was the predictable backlash to these folks from their counterparties, typically statisticians or statistically trained folks who care about open source.
Scientific Misconduct and Scientific Expertise
1st Barcelona HPS workshop
November 11, 2016
Departament de Filosofia & Centre d’Història de la Ciència (CEHIC), Universitat Autònoma de Barcelona (UAB)
Location: CEHIC, Mòdul de Recerca C, Seminari L3-05, c/ de Can Magrans s/n, Campus de la UAB, 08193 Bellaterra (Barcelona)
Organized by Thomas Sturm & Agustí Nieto-Galan
Current science is full of uncertainties and risks that weaken the authority of experts. Moreover, sometimes scientists themselves act in ways that weaken their standing: they manipulate data, exaggerate research results, do not give credit where it is due, violate the norms for the acquisition of academic titles, or are unduly influenced by commercial and political interests. Such actions, of which there are numerous examples in past and present times, are widely conceived of as violating standards of good scientific practice. At the same time, while codes of scientific conduct have been developed in different fields, institutions, and countries, there is no universally agreed canon of them, nor is it clear that there should be one. The workshop aims to bring together historians and philosophers of science in order to discuss questions such as the following: What exactly is scientific misconduct? Under which circumstances are researchers more or less liable to misconduct? How far do cases of misconduct undermine scientific authority? How have standards or mechanisms to avoid misconduct, and to regain scientific authority, been developed? How should they be developed?
All welcome – but since space is limited, please register in advance. Write to: Thomas.Sturm@uab.cat
09:30 Welcome (Thomas Sturm & Agustí Nieto-Galan) Continue reading
G. A. Barnard: 23 Sept 1915-30 July, 2002
Today is George Barnard’s 101st birthday. In honor of this, I reblog an exchange between Barnard, Savage (and others) on likelihood vs probability. The exchange is from pp 79-84 (of what I call) “The Savage Forum” (Savage, 1962).[i] Six other posts on Barnard are linked below: 2 are guest posts (Senn, Spanos); the other 4 include a play (pertaining to our first meeting), and a letter he wrote to me.
BARNARD:…Professor Savage, as I understand him, said earlier that a difference between likelihoods and probabilities was that probabilities would normalize because they integrate to one, whereas likelihoods will not. Now probabilities integrate to one only if all possibilities are taken into account. This requires in its application to the probability of hypotheses that we should be in a position to enumerate all possible hypotheses which might explain a given set of data. Now I think it is just not true that we ever can enumerate all possible hypotheses. … If this is so we ought to allow that in addition to the hypotheses that we really consider we should allow something that we had not thought of yet, and of course as soon as we do this we lose the normalizing factor of the probability, and from that point of view probability has no advantage over likelihood. This is my general point, that I think while I agree with a lot of the technical points, I would prefer that this is talked about in terms of likelihood rather than probability. I should like to ask what Professor Savage thinks about that, whether he thinks that the necessity to enumerate hypotheses exhaustively, is important. Continue reading
Objectivity in statistics, as in science more generally, is a matter of both aims and methods. Objective science, in our view, aims to find out what is the case as regards aspects of the world [that hold] independently of our beliefs, biases and interests; thus objective methods aim for the critical control of inference and hypotheses, constraining them by evidence and checks of error. (Cox and Mayo 2010, p. 276)
I. The myth of objectivity. Whenever you come up against blanket slogans such as “no methods are objective” or “all methods are equally objective and subjective,” it is a good guess that the problem is being trivialized into oblivion. Yes, there are judgments, disagreements, and values in any human activity, which alone makes it too trivial an observation to distinguish among very different ways that threats of bias and unwarranted inferences may be controlled. Is the objectivity-subjectivity distinction really toothless as many will have you believe? I say no.
Cavalier attitudes toward objectivity are in tension with widely endorsed movements to promote replication, reproducibility, and to come clean on a number of sources behind illicit results: multiple testing, cherry picking, failed assumptions, researcher latitude, publication bias and so on. The moves to take back science–if they are not mere lip-service–are rooted in the supposition that we can more objectively scrutinize results,even if it’s only to point out those that are poorly tested. The fact that the term “objectivity” is used equivocally should not be taken as grounds to oust it, but rather to engage in the difficult work of identifying what there is in “objectivity” that we won’t give up, and shouldn’t. Continue reading
C. S. Peirce: 10 Sept, 1839-19 April, 1914
Today is C.S. Peirce’s birthday. He’s one of my all time heroes. You should read him: he’s a treasure chest on essentially any topic, and he anticipated several major ideas in statistics (e.g., randomization, confidence intervals) as well as in logic. I’ll reblog the first portion of a (2005) paper of mine. Links to Parts 2 and 3 are at the end. It’s written for a very general philosophical audience; the statistical parts are pretty informal. Happy birthday Peirce.
Peircean Induction and the Error-Correcting Thesis
Deborah G. Mayo
Transactions of the Charles S. Peirce Society: A Quarterly Journal in American Philosophy, Volume 41, Number 2, 2005, pp. 299-319
Peirce’s philosophy of inductive inference in science is based on the idea that what permits us to make progress in science, what allows our knowledge to grow, is the fact that science uses methods that are self-correcting or error-correcting:
Induction is the experimental testing of a theory. The justification of it is that, although the conclusion at any stage of the investigation may be more or less erroneous, yet the further application of the same method must correct the error. (5.145)
The consequences of recent criticisms of statistical tests have breathed brand new life into some very old howlers, many of which have been discussed on this blog. What is not funny, though, is how standard notions such as frequentist error probabilities are being redefined in the process, and how we now have arguments built on equivocations. In fact, there are official guidebooks for the statistically perplexed giving inconsistent definitions to the same term (See for just 1 of many examples this post). How much more perplexed will that leave us! Since it’s near the 5-year anniversary of this blog, let’s listen in to a new comedy hour mixing one from 3 years ago with some add-ons*.
Did you hear the one about the frequentist significance tester when he was shown the nonfrequentist nature of p-values?
Critic: I just simulated a long series of tests on a pool of null hypotheses, and I found that among tests with p-values of .05, at least 22%—and typically over 50%—of the null hypotheses are true!
Frequentist Significance Tester: Scratches head: But rejecting the null with a p-value of .05 ensures erroneous rejection no more than 5% of the time!
Raucous laughter ensues!
(Hah, hah… “So funny, I forgot to laugh! Or, I’m crying and laughing at the same time!) Continue reading