Head of Competence Center for Methodology and Statistics (CCMS)
This story is based on a paradox proposed to me by Don Berry. I have my own opinion on this but I find that opinion boring and predictable. The opinion of others is much more interesting and so I am putting this up for others to interpret.
Two scientists working for a pharmaceutical company collaborate in designing and running a clinical trial known as CONFUSE (Clinical Outcomes in Neuropathic Fibromyalgia in US Elderly). One of them, Smith is going to start another programme of drug development in a little while. The other one, Jones, will just be working on the current project. The planned sample size is 6000 patients.
Smith says that he would like to look at the experiment after 3000 patients in order to make an important decision as regards his other project. As far as he is concerned that’s good enough.
Jones is horrified. She considers that for other reasons CONFUSE should continue to recruit all 6000 and that on no account should the trial be stopped early.
Smith say that he is simply going to look at the data to decide whether to initiate a trial in a similar product being studied in the other project he will be working on. The fact that he looks should not affect Jones’s analysis.
Jones is still very unhappy and points out that the integrity of her trial is being compromised.
Smith suggests that all that she needs to do is to state quite clearly in the protocol that the trial will proceed whatever the result of the interim administrative look and she should just write that this is so in the protocol. The fact that she states publicly that on no account will she claim significance based on the first 3000 alone will reassure everybody including the FDA. (In drug development circles, FDA stands for Finally Decisive Argument.)
However, Jones insists. She wants to know what Smith will do if the result after 3000 patients is not significant.
Smith replies that in that case he will not initiate the trial in the parallel project. It will suggest to him that it is not worth going ahead.
Jones wants to know suppose that the results for the first 3000 are not significant what will Smith do once the results of all 6000 are in.
Smith replies that, of course, in that case he will have a look. If (but it seems to him an unlikely situation) the results based on all 6000 will be significant, even though the results based on the first 3000 were not, he may well decide that the treatment works after all and initiate his alternative program, regretting, of course, the time that has been lost.
Jones points out that Smith will not be controlling his type I error rate by this procedure.
‘OK’, Says Smith, ‘to satisfy you I will use adjusted type I error rates. You, of course, don’t have to.’
The trial is run. Smith looks after 3000 patients and concludes the difference is not significant. The trial continues on its planned course. Jones looks after 6000 and concludes it is significant P=0.049. Smith looks after 6000 and concludes it is not significant, P=0.052. (A very similar thing happened in the famous TORCH study(1))
Shortly after the conclusion of the trial, Smith and Jones are head-hunted and leave the company. The brief is taken over by new recruit Evans.
What does Evans have on her hands: a significant study or not?
Reference
1. Calverley PM, Anderson JA, Celli B, Ferguson GT, Jenkins C, Jones PW, et al. Salmeterol and fluticasone propionate and survival in chronic obstructive pulmonary disease. The New England journal of medicine. 2007;356(8):775-89.