Error Statistics Philosophy

S. Senn: “Responder despondency: myths of personalized medicine” (Guest Post)

.

Stephen Senn
Head, Methodology and Statistics Group
Competence Center for Methodology and Statistics (CCMS)
Luxembourg

Responder despondency: myths of personalized medicine

The road to drug development destruction is paved with good intentions. The 2013 FDA report, Paving the Way for Personalized Medicine  has an encouraging and enthusiastic foreword from Commissioner Hamburg and plenty of extremely interesting examples stretching back decades. Given what the report shows can be achieved on occasion, given the enthusiasm of the FDA and its commissioner, given the amazing progress in genetics emerging from the labs, a golden future of personalized medicine surely awaits us. It would be churlish to spoil the party by sounding a note of caution but I have never shirked being churlish and that is exactly what I am going to do.

Reading the report, alarm bells began to ring when I came across this chart (p17) describing the percentage of patients for whom drug are ineffective. Actually, I tell a lie. The alarm bells were ringing as soon as I saw the title but by the time I saw this chart, the cacophony was deafening.

The question that immediately arose in my mind was ‘how do the FDA know this is true?’ Well, the Agency very helpfully tells you how they know this is true. They cite a publication, ‘Clinical application of pharmacogenetics’[1] as the source of the chart. Slightly surprisingly, the date of the publication predates the FDA report by 12 years (this is pre-history in pharmacogenetic terms) however, sure enough, if you look up the cited paper you will find that the authors (Spear et al) state ‘We have analyzed the efficacy of major drugs in several important diseases based on published data, and the summary of the information is given in Table 1.’ This is Table 1:

Now, there are a few differences here to the FDA report but we have to give the Agency some credit. First of all they have decided to concentrate on those who don’t respond, so they have subtracted the response rates from 100. Second, they have obviously learned an important data presentation lesson: sorting by the alphabet is often inferior to sorting by importance. Unfortunately, they have ignored an important lesson that texts on graphical excellence impart: don’t clutter your presentation with chart junk[2]. However, in the words of Meatloaf, ‘Two out of three ain’t bad,’ so I have to give them some credit.

However, that’s not quite the end of the story. Note the superscripted 1 in the rubric of the source for the FDA claim. That’s rather important. This gives you the source of the information, which is the Physician’s Desk Reference, 54th edition, 2000.

At this point of tracing back, I discovered what I knew already. What the FDA is quoting are zombie statistics. This is not to impugn the work of Spear et al. The paper makes interesting points. (I can’t even blame them for not citing one of my favourite papers[3], since it appeared in the same year.) They may well have worked diligently to collect the data they did but the trail runs cold here. The methodology is not given and the results can’t be checked. It may be true, it may be false but nobody, and that includes the FDA and its commissioner, knows.

But there is a further problem. There is a very obvious trap in using observed response rates to judge what percentage of patient respond (or don’t). That is that all such measures are subject to within-patient variability. To take a field I have worked in, asthma, if you take (as the FDA has on occasion) 15% increase in Forced Expiratory Volume in one second (FEV1) above baseline as indicating a response. You will classify someone with a 14% value as a non-responder and someone with a 16 % value as a responder but measure them again and they could easily change places (see chapter 8 of Statistical Issues in Drug Development[4]) . For a bronchodilator I worked on, mean bronchodilation at 12 hours was about 18% so you simply needed to base your measurement of effect on a number of replicates if you wanted to increase the proportion of responders.

There is a very obvious trap (or at least it ought to be obvious to all statisticians) in naively using reported response rates as an indicator of variation in true response[5]. This can be illustrated using the graph below. On the left hand side you see an ideal counterfactual experiment. Every patient can be treated under identical conditions with both treatments. In this thought experiment the difference that the treatment makes to each patient is constant. However, life does not afford us this possibility. If what we choose to do is run a parallel group trial we will have to randomly give the patient either placebo or the active treatment. The right hand panel shows us what we will see and is obtained by randomly erasing one of the two points for each patient on the left hand panel. It is now impossible to judge individual response: all that we can judge is the average.

Of course, I fixed things in the example so that response was constant and it clearly might not be. But that is not the point. The point is that the diagram shows that by naively using raw outcomes we will overestimate the personal element of response. In fact, only repeated cross-over trial can reliable tease out individual response from other components of variation and in many indications these are not possible and even where they are possible they are rarely run[6].

So to sum up, the reason the FDA ‘knows’ that 40% of asthmatic patients don’t respond to treatment is because a paper from 2001, with unspecified methodology, most probably failing to account for within patient variability, reports that the authors found this to be the case by studying the Physician’s Desk Reference.

This is nothing short of a scandal. I don’t blame the FDA. I blame me and my fellow statisticians. Why and how are we allowing our life scientist colleagues to get away with this nonsense? They genuinely believe it. We ought to know better.

References

  1. Spear, B.B., M. Heath-Chiozzi, and J. Huff, Clinical application of pharmacogenetics. Trends in Molecular Medicine, 2001. 7(5): p. 201-204.
  2. Tufte, E.R., The Visual Display of Quantitative Information. 1983, Cheshire Connecticut: Graphics Press.
  3. Senn, S.J., Individual Therapy: New Dawn or False Dawn. Drug Information Journal, 2001. 35(4): p. 1479-1494.
  4. Senn, S.J., Statistical Issues in Drug Development. 2007, Hoboken: Wiley. 498.
  5. Senn, S., Individual response to treatment: is it a valid assumption? BMJ, 2004. 329(7472): p. 966-8.
  6. Senn, S.J., Three things every medical writer should know about statistics. The Write Stuff, 2009. 18(3): p. 159-162.