Author Archives: Deborah Mayo

About Deborah Mayo

I am a professor in the Department of Philosophy at Virginia Tech and hold a visiting appointment at the Center for the Philosophy of Natural and Social Science of the London School of Economics. I am the author of Error and the Growth of Experimental Knowledge, which won the 1998 Lakatos Prize, awarded to the most outstanding contribution to the philosophy of science during the previous six years. I have applied my approach toward solving key problems in philosophy of science: underdetermination, the role of novel evidence, Duhem's problem, and the nature of scientific progress. I am also interested in applications to problems in risk analysis and risk controversies, and co-edited Acceptable Evidence: Science and Values in Risk Management (with Rachelle Hollander). I teach courses in introductory and advanced logic (including the metatheory of logic and modal logic), in scientific method, and in philosophy of science.I also teach special topics courses in Science and Technology Studies.

S. Senn: “Responder despondency: myths of personalized medicine” (Guest Post)

Stephen Senn

.

Stephen Senn
Head, Methodology and Statistics Group
Competence Center for Methodology and Statistics (CCMS)
Luxembourg

Responder despondency: myths of personalized medicine

The road to drug development destruction is paved with good intentions. The 2013 FDA report, Paving the Way for Personalized Medicine  has an encouraging and enthusiastic foreword from Commissioner Hamburg and plenty of extremely interesting examples stretching back decades. Given what the report shows can be achieved on occasion, given the enthusiasm of the FDA and its commissioner, given the amazing progress in genetics emerging from the labs, a golden future of personalized medicine surely awaits us. It would be churlish to spoil the party by sounding a note of caution but I have never shirked being churlish and that is exactly what I am going to do.

Reading the report, alarm bells began to ring when I came across this chart (p17) describing the percentage of patients for whom drug are ineffective. Actually, I tell a lie. The alarm bells were ringing as soon as I saw the title but by the time I saw this chart, the cacophony was deafening.

senn responder pic

The question that immediately arose in my mind was ‘how do the FDA know this is true?’ Well, the Agency very helpfully tells you how they know this is true. They cite a publication, ‘Clinical application of pharmacogenetics’[1] as the source of the chart. Slightly surprisingly, the date of the publication predates the FDA report by 12 years (this is pre-history in pharmacogenetic terms) however, sure enough, if you look up the cited paper you will find that the authors (Spear et al) state ‘We have analyzed the efficacy of major drugs in several important diseases based on published data, and the summary of the information is given in Table 1.’ This is Table 1:

senn responder table 1

Now, there are a few differences here to the FDA report but we have to give the Agency some credit. First of all they have decided to concentrate on those who don’t respond, so they have subtracted the response rates from 100. Second, they have obviously learned an important data presentation lesson: sorting by the alphabet is often inferior to sorting by importance. Unfortunately, they have ignored an important lesson that texts on graphical excellence impart: don’t clutter your presentation with chart junk[2]. However, in the words of Meatloaf, ‘Two out of three ain’t bad,’ so I have to give them some credit.

However, that’s not quite the end of the story. Note the superscripted 1 in the rubric of the source for the FDA claim. That’s rather important. This gives you the source of the information, which is the Physician’s Desk Reference, 54th edition, 2000.

At this point of tracing back, I discovered what I knew already. What the FDA is quoting are zombie statistics. This is not to impugn the work of Spear et al. The paper makes interesting points. (I can’t even blame them for not citing one of my favourite papers[3], since it appeared in the same year.) They may well have worked diligently to collect the data they did but the trail runs cold here. The methodology is not given and the results can’t be checked. It may be true, it may be false but nobody, and that includes the FDA and its commissioner, knows.

But there is a further problem. There is a very obvious trap in using observed response rates to judge what percentage of patient respond (or don’t). That is that all such measures are subject to within-patient variability. To take a field I have worked in, asthma, if you take (as the FDA has on occasion) 15% increase in Forced Expiratory Volume in one second (FEV1) above baseline as indicating a response. You will classify someone with a 14% value as a non-responder and someone with a 16 % value as a responder but measure them again and they could easily change places (see chapter 8 of Statistical Issues in Drug Development[4]) . For a bronchodilator I worked on, mean bronchodilation at 12 hours was about 18% so you simply needed to base your measurement of effect on a number of replicates if you wanted to increase the proportion of responders.

There is a very obvious trap (or at least it ought to be obvious to all statisticians) in naively using reported response rates as an indicator of variation in true response[5]. This can be illustrated using the graph below. On the left hand side you see an ideal counterfactual experiment. Every patient can be treated under identical conditions with both treatments. In this thought experiment the difference that the treatment makes to each patient is constant. However, life does not afford us this possibility. If what we choose to do is run a parallel group trial we will have to randomly give the patient either placebo or the active treatment. The right hand panel shows us what we will see and is obtained by randomly erasing one of the two points for each patient on the left hand panel. It is now impossible to judge individual response: all that we can judge is the average.

senn responder graphs

Of course, I fixed things in the example so that response was constant and it clearly might not be. But that is not the point. The point is that the diagram shows that by naively using raw outcomes we will overestimate the personal element of response. In fact, only repeated cross-over trial can reliable tease out individual response from other components of variation and in many indications these are not possible and even where they are possible they are rarely run[6].

So to sum up, the reason the FDA ‘knows’ that 40% of asthmatic patients don’t respond to treatment is because a paper from 2001, with unspecified methodology, most probably failing to account for within patient variability, reports that the authors found this to be the case by studying the Physician’s Desk Reference.

This is nothing short of a scandal. I don’t blame the FDA. I blame me and my fellow statisticians. Why and how are we allowing our life scientist colleagues to get away with this nonsense? They genuinely believe it. We ought to know better.

References

  1. Spear, B.B., M. Heath-Chiozzi, and J. Huff, Clinical application of pharmacogenetics. Trends in Molecular Medicine, 2001. 7(5): p. 201-204.
  2. Tufte, E.R., The Visual Display of Quantitative Information. 1983, Cheshire Connecticut: Graphics Press.
  3. Senn, S.J., Individual Therapy: New Dawn or False Dawn. Drug Information Journal, 2001. 35(4): p. 1479-1494.
  4. Senn, S.J., Statistical Issues in Drug Development. 2007, Hoboken: Wiley. 498.
  5. Senn, S., Individual response to treatment: is it a valid assumption? BMJ, 2004. 329(7472): p. 966-8.
  6. Senn, S.J., Three things every medical writer should know about statistics. The Write Stuff, 2009. 18(3): p. 159-162.

 

Categories: evidence-based policy, Statistics, Stephen Senn | 16 Comments

Continued:”P-values overstate the evidence against the null”: legit or fallacious?

.

continued…

Categories: Bayesian/frequentist, CIs and tests, fallacy of rejection, highly probable vs highly probed, P-values, Statistics | 35 Comments

“P-values overstate the evidence against the null”: legit or fallacious? (revised)

0. July 20, 2014: Some of the comments to this post reveal that using the word “fallacy” in my original title might have encouraged running together the current issue with the fallacy of transposing the conditional. Please see a newly added Section 7.

Continue reading

Categories: Bayesian/frequentist, CIs and tests, fallacy of rejection, highly probable vs highly probed, P-values, Statistics | 71 Comments

Higgs discovery two years on (2: Higgs analysis and statistical flukes)

Higgs_cake-sI’m reblogging a few of the Higgs posts, with some updated remarks, on this two-year anniversary of the discovery. (The first was in my last post.) The following, was originally “Higgs Analysis and Statistical Flukes: part 2″ (from March, 2013).[1]

Some people say to me: “This kind of reasoning is fine for a ‘sexy science’ like high energy physics (HEP)”–as if their statistical inferences are radically different. But I maintain that this is the mode by which data are used in “uncertain” reasoning across the entire landscape of science and day-to-day learning (at least, when we’re trying to find things out)[2] Even with high level theories, the particular problems of learning from data are tackled piecemeal, in local inferences that afford error control. Granted, this statistical philosophy differs importantly from those that view the task as assigning comparative (or absolute) degrees-of-support/belief/plausibility to propositions, models, or theories.  Continue reading

Categories: Higgs, highly probable vs highly probed, P-values, Severity, Statistics | 13 Comments

Higgs Discovery two years on (1: “Is particle physics bad science?”)

Higgs_cake-s

July 4, 2014 was the two year anniversary of the Higgs boson discovery. As the world was celebrating the “5 sigma!” announcement, and we were reading about the statistical aspects of this major accomplishment, I was aghast to be emailed a letter, purportedly instigated by Bayesian Dennis Lindley, through Tony O’Hagan (to the ISBA). Lindley, according to this letter, wanted to know:

“Are the particle physics community completely wedded to frequentist analysis?  If so, has anyone tried to explain what bad science that is?”

Fairly sure it was a joke, I posted it on my “Rejected Posts” blog for a bit until it checked out [1]. (See O’Hagan’s “Digest and Discussion”) Continue reading

Categories: Bayesian/frequentist, fallacy of non-significance, Higgs, Lindley, Statistics | Tags: , , , , , | 4 Comments

Winner of June Palindrome Contest: Lori Wike

photo

.

Winner of June 2014 Palindrome Contest: First Second* Time Winner! Lori Wike

*Her April win is here

Palindrome:

Parsec? I overfit omen as Elba sung “I err on! Oh, honor reign!” Usable, sane motif revoices rap.

The requirement: A palindrome with Elba plus overfit. (The optional second word: “average” was not needed to win.)

Bio:

Lori Wike is principal bassoonist of the Utah Symphony and is on the faculty of the University of Utah and Westminster College. She holds a Bachelor of Music degree from the Eastman School of Music and a Master of Arts degree in Comparative Literature from UC-Irvine.

Continue reading

Categories: Announcement, Palindrome | Leave a comment

Some ironies in the ‘replication crisis’ in social psychology (4th and final installment)

freud mirror espThere are some ironic twists in the way social psychology is dealing with its “replication crisis”, and they may well threaten even the most sincere efforts to put the field on firmer scientific footing–precisely in those areas that evoked the call for a “daisy chain” of replications. Two articles, one from the Guardian (June 14), and a second from The Chronicle of Higher Education (June 23) lay out the sources of what some are calling “Repligate”. The Guardian article is “Physics Envy: Do ‘hard’ sciences hold the solution to the replication crisis in psychology?”

The article in the Chronicle of Higher Education also gets credit for its title: “Replication Crisis in Psychology Research Turns Ugly and Odd”. I’ll likely write this in installments…(2nd, 3rd , 4th)

^^^^^^^^^^^^^^^

The Guardian article answers yes to the question “Do ‘hard’ sciences hold the solution“:

Psychology is evolving faster than ever. For decades now, many areas in psychology have relied on what academics call “questionable research practices” – a comfortable euphemism for types of malpractice that distort science but which fall short of the blackest of frauds, fabricating data.
Continue reading

Categories: junk science, science communication, Statistical fraudbusting, Statistics | 53 Comments

Sir David Hendry Gets Lifetime Achievement Award

images-17Sir David Hendry, Professor of Economics at the University of Oxford [1], was given the Celebrating Impact Lifetime Achievement Award on June 8, 2014. Professor Hendry presented his automatic model selection program (Autometrics) at our conference, Statistical Science and Philosophy of Science (June, 2010) (Site is here.) I’m posting an interesting video and related links. I invite comments on the paper Hendry published, “Empirical Economic Model Discovery and Theory Evaluation,” in our special volume of Rationality, Markets, and Morals (abstract below). [2]

One of the world’s leading economists, INET Oxford’s Prof. Sir David Hendry received a unique award from the Economic and Social Research Council (ESRC)…
Continue reading

Categories: David Hendry, StatSci meets PhilSci | Tags: | Leave a comment

Blog Contents: May 2014

metablog old fashion typewriter

.

May 2014

(5/1) Putting the brakes on the breakthrough: An informal look at the argument for the Likelihood Principle

(5/3) You can only become coherent by ‘converting’ non-Bayesianly

(5/6) Winner of April Palindrome contest: Lori Wike

(5/7) A. Spanos: Talking back to the critics using error statistics (Phil6334)

(5/10) Who ya gonna call for statistical Fraudbusting? R.A. Fisher, P-values, and error statistics (again)

(5/15) Scientism and Statisticism: a conference* (i) Continue reading

Categories: blog contents, Metablog, Statistics | Leave a comment

Big Bayes Stories? (draft ii)

images-15“Wonderful examples, but let’s not close our eyes,”  is David J. Hand’s apt title for his discussion of the recent special issue (Feb 2014) of Statistical Science called Big Bayes Stories” (edited by Sharon McGrayne, Kerrie Mengersen and Christian Robert.) For your Saturday night/ weekend reading, here are excerpts from Hand, another discussant (Welsh), scattered remarks of mine, along with links to papers and background. I begin with David Hand:

 [The papers in this collection] give examples of problems which are well-suited to being tackled using such methods, but one must not lose sight of the merits of having multiple different strategies and tools in one’s inferential armory.(Hand [1])_

…. But I have to ask, is the emphasis on ‘Bayesian’ necessary? That is, do we need further demonstrations aimed at promoting the merits of Bayesian methods? … The examples in this special issue were selected, firstly by the authors, who decided what to write about, and then, secondly, by the editors, in deciding the extent to which the articles conformed to their desiderata of being Bayesian success stories: that they ‘present actual data processing stories where a non-Bayesian solution would have failed or produced sub-optimal results.’ In a way I think this is unfortunate. I am certainly convinced of the power of Bayesian inference for tackling many problems, but the generality and power of the method is not really demonstrated by a collection specifically selected on the grounds that this approach works and others fail. To take just one example, choosing problems which would be difficult to attack using the Neyman-Pearson hypothesis testing strategy would not be a convincing demonstration of a weakness of that approach if those problems lay outside the class that that approach was designed to attack.

Hand goes on to make a philosophical assumption that might well be questioned by Bayesians: Continue reading

Categories: Bayesian/frequentist, Honorary Mention, Statistics | 62 Comments

“Statistical Science and Philosophy of Science: where should they meet?”

img_1142

Four score years ago (!) we held the conference “Statistical Science and Philosophy of Science: Where Do (Should) They meet?” at the London School of Economics, Center for the Philosophy of Natural and Social Science, CPNSS, where I’m visiting professor [1] Many of the discussions on this blog grew out of contributions from the conference, and conversations initiated soon after. The conference site is here; my paper on the general question is here.[2]

My main contribution was “Statistical Science Meets Philosophy of Science Part 2: Shallow versus Deep Explorations” SS & POS 2. It begins like this: 

1. Comedy Hour at the Bayesian Retreat[3]

 Overheard at the comedy hour at the Bayesian retreat: Did you hear the one about the frequentist… Continue reading

Categories: Error Statistics, Philosophy of Statistics, Severity, Statistics, StatSci meets PhilSci | 23 Comments

A. Spanos: “Recurring controversies about P values and confidence intervals revisited”

A SPANOS

Aris Spanos
Wilson E. Schmidt Professor of Economics
Department of Economics, Virginia Tech

Recurring controversies about P values and confidence intervals revisited*
Ecological Society of America (ESA) ECOLOGY
Forum—P Values and Model Selection (pp. 609-654)
Volume 95, Issue 3 (March 2014): pp. 645-651

INTRODUCTION

The use, abuse, interpretations and reinterpretations of the notion of a P value has been a hot topic of controversy since the 1950s in statistics and several applied fields, including psychology, sociology, ecology, medicine, and economics.

The initial controversy between Fisher’s significance testing and the Neyman and Pearson (N-P; 1933) hypothesis testing concerned the extent to which the pre-data Type  I  error  probability  α can  address the arbitrariness and potential abuse of Fisher’s post-data  threshold for the value. Continue reading

Categories: CIs and tests, Error Statistics, Fisher, P-values, power, Statistics | 32 Comments

“The medical press must become irrelevant to publication of clinical trials.”

pmed0020138g001“The medical press must become irrelevant to publication of clinical trials.” So said Stephen Senn at a recent meeting of the Medical Journalists’ Association with the title: “Is the current system of publishing clinical trials fit for purpose?” Senn has thrown a few stones in the direction of medical journals in guest posts on this blog, and in this paper, but it’s the first I heard him go this far. He wasn’t the only one answering the conference question “No!” much to the surprise of medical journalist Jane Feinmann, whose article I am excerpting:

 So what happened? Medical journals, the main vehicles for publishing clinical trials today, are after all the ‘gatekeepers of medical evidence’—as they are described in Bad Pharma, Ben Goldacre’s 2012 bestseller. …

… The Alltrials campaign, launched two years ago on the back of Goldacre’s book, has attracted an extraordinary level of support. … Continue reading

Categories: PhilPharma, science communication, Statistics | 5 Comments

Stephen Senn: Blood Simple? The complicated and controversial world of bioequivalence (guest post)

Stephen SennBlood Simple?
The complicated and controversial world of bioequivalence

by Stephen Senn*

images-10

Those not familiar with drug development might suppose that showing that a new pharmaceutical formulation (say a generic drug) is equivalent to a formulation that has a licence (say a brand name drug) ought to be simple. However, it can often turn out to be bafflingly difficult[1]. Continue reading

Categories: bioequivalence, confidence intervals and tests, PhilPharma, Statistics, Stephen Senn | 22 Comments

What have we learned from the Anil Potti training and test data fireworks ? Part 1 (draft 2)

toilet-fireworks-by-stephenthruvegas-on-flickr

Over 100 patients signed up for the chance to participate in the clinical trials at Duke (2007-10) that promised a custom-tailored cancer treatment spewed out by a cutting-edge prediction model developed by Anil Potti, Joseph Nevins and their team at Duke. Their model purported to predict your probable response to one or another chemotherapy based on microarray analyses of various tumors. While they are now described as “false pioneers” of personalized cancer treatments, it’s not clear what has been learned from the fireworks surrounding the Potti episode overall. Most of the popular focus has been on glaring typographical and data processing errors—at least that’s what I mainly heard about until recently. Although they were quite crucial to the science in this case,(surely more so than Potti’s CV padding) what interests me now are the general methodological and logical concerns that rarely make it into the popular press. Continue reading

Categories: science communication, selection effects, Statistical fraudbusting | 33 Comments

Allan Birnbaum, Philosophical Error Statistician: 27 May 1923 – 1 July 1976

27 May 1923-   1 July 1976

Today is Allan Birnbaum’s Birthday. Birnbaum’s (1962) classic “On the Foundations of Statistical Inference” is in Breakthroughs in Statistics (volume I 1993).  I’ve a hunch that Birnbaum would have liked my rejoinder to discussants of my forthcoming paper (Statistical Science): Bjornstad, Dawid, Evans, Fraser, Hannig, and Martin and Liu. I hadn’t realized until recently that all of this is up under “future papers” here [1]. You can find the rejoinder: STS1404-004RA0-2. That takes away some of the surprise of having it all come out at once (and in final form). For those unfamiliar with the argument, at the end of this entry are slides from a recent, entirely informal, talk that I never posted, as well as some links from this blog. Happy Birthday Birnbaum! Continue reading

Categories: Birnbaum, Birnbaum Brakes, Likelihood Principle, Statistics | Leave a comment

Blog Table of Contents: March and April 2014

2208388671_0d8bc38714

.

BLOG Contents: March and April 2014
Compiled by Jean Miller and Nicole Jinn

March 2014

(3/1) Cosma Shalizi gets tenure (at last!) (metastat announcement)

(3/2) Significance tests and frequentist principles of evidence: Phil6334 Day #6

(3/3) Capitalizing on Chance (ii)

(3/4) Power, power everywhere–(it) may not be what you think! [illustration]

(3/8) Msc kvetch: You are fully dressed (even under you clothes)? Continue reading

Categories: blog contents | Leave a comment

The Science Wars & the Statistics Wars: More from the Scientism workshop

images-11-1Here are the slides from my presentation (May 17) at the Scientism workshop in NYC. (They’re sketchy since we were trying for 25-30 minutes.) Below them are some mini notes on some of the talks.

Now for my informal notes. Here’s a link to the Speaker abstracts;the presentations may now be found at the conference site here. Comments, questions, and corrections are welcome. Continue reading

Categories: evidence-based policy, frequentist/Bayesian, Higgs, P-values, scientism, Statistics, StatSci meets PhilSci | 11 Comments

Deconstructing Andrew Gelman: “A Bayesian wants everybody else to be a non-Bayesian.”

At the start of our seminar, I said that “on weekends this spring (in connection with Phil 6334, but not limited to seminar participants) I will post some of my ‘deconstructions of articles”. I began with Andrew Gelman‘s note  “Ethics and the statistical use of prior information”[i], but never posted my deconstruction of it. So since it’s Saturday night, and the seminar is just ending, here it is, along with related links to Stat and ESP research (including me, Jack Good, Persi Diaconis and Pat Suppes). Please share comments especially in relation to current day ESP research. Continue reading

Categories: Background knowledge, Gelman, Phil6334, Statistics | 35 Comments

Scientism and Statisticism: a conference* (i)

images-11A lot of philosophers and scientists seem to be talking about scientism these days–either championing it or worrying about it. What is it? It’s usually a pejorative term describing an unwarranted deference to the so-called scientific method over and above other methods of inquiry. Some push it as a way to combat postmodernism (is that even still around?) Stephen Pinker gives scientism a positive spin (and even offers it as a cure for the malaise of the humanities!)[1]. Anyway, I’m to talk at a conference on Scientism (*not statisticism, that’s my word) taking place in NYC May 16-17. It is organized by Massimo Pigliucci (chair of philosophy at CUNY-Lehman), who has written quite a lot on the topic in the past few years. Information can be found here. In thinking about scientism for this conference, however, I was immediately struck by this puzzle: Continue reading

Categories: Announcement, PhilStatLaw, science communication, Statistical fraudbusting, StatSci meets PhilSci | Tags: | 15 Comments

Blog at WordPress.com. The Adventure Journal Theme.

Follow

Get every new post delivered to your Inbox.

Join 389 other followers