Stephen Senn: Open Season (guest post)

Stephen SennStephen Senn
Head, Methodology and Statistics Group,
Competence Center for Methodology and Statistics (CCMS),

“Open Season”

The recent joint statement(1) by the Pharmaceutical Research and Manufacturers of America (PhRMA) and the European Federation of Pharmaceutical Industries and Associations(EFPIA) represents a further step in what has been a slow journey towards (one assumes) will be the achieved  goal of sharing clinical trial data. In my inaugural lecture of 1997 at University College London I called for all pharmaceutical companies to develop a policy for sharing trial results and I have repeated this in many places since(2-5). Thus I can hardly complain if what I have been calling for for over 15 years is now close to being achieved.

However, I have now recently been thinking about it again and it seems to me that there are some problems that need to be addressed. One is the issue of patient confidentiality. Ideally, covariate information should be exploitable as such often increases the precision of inferences and also the utility of decisions based upon them since they (potentially) increase the possibility of personalising medical interventions. However, providing patient-level data increases the risk of breaching confidentiality. This is a complicated and difficult issue about which, however, I have nothing useful to say. Instead I want to consider another matter. What will be the influence on the quality of the inferences we make of enabling many subsequent researchers to analyse the same data?

One of the reasons that many researchers have called for all trials to be published is that trials that are missing tend to be different from those that are present. Thus there is a bias in summarising evidence from published trial only and it can be a difficult task with no guarantee of success to identify those that have not been published. This is a wider reflection of the problem of missing data within trials. Such data have long worried trialists and the Food and Drug Administration (FDA) itself has commissioned a report on the subject from leading experts(6). On the European side the Committee for Medicinal Products for Human Use (CHMP) has a guideline dealing with it(7).

However, the problem is really a particular example of data filtering and it also applies to statistical analysis. If the analyses that are present have been selected from a wider set, then there is a danger that they do not provide an honest reflection of the message that is in the data. This problem is known as that of multiplicity and there is a huge literature dealing with it, including regulatory guidance documents(8, 9).

Within drug regulation this is dealt with by having pre-specified analyses. The broad outlines of these are usually established in the trial protocol and the approach is then specified in some detail in the statistical analysis plan which is required to be finalised before un-blinding of the data. The strategies used to control for multiplicity will involve some combination of defining a significance testing route (an order in which test must be performed and associated decision rules) and reduction of the required level of significance to detect an event.

I am not a great fan of these manoeuvres, which can be extremely complex. One of my objections is that it is effectively assumed that the researchers who chose them are mandated to circumscribe the inferences that scientific posterity can make(10). I take the rather more liberal view that provided that everything that is tested is reported one can test as much as one likes. The problem comes if there is selective use of results and in particular selective reporting. Nevertheless, I would be the first to concede the value of pre-specification in clarifying the thinking of those about to embark on conducting a clinical trial and also in providing a ‘template of trust’ for the regulator when provided with analyses by the sponsor.

However, what should be our attitude to secondary analyses? From one point of view these should be welcome. There is always value in looking at data from different perspectives and indeed this can be one way of strengthening inferences in the way suggested nearly 50 years ago by Platt(11). There are two problems, however. First, not all perspectives are equally valuable. Some analyses in the future, no doubt, will be carried out by those with little expertise and in some cases, perhaps, by those with a particular viewpoint to justify. There is also the danger that some will carry out multiple analyses (of which, when one consider the possibility of changing endpoints, performing transformations, choosing covariates and modelling framework there are usually a great number) but then only present those that are ‘interesting’. It is precisely to avoid this danger that the ritual of pre-specified analysis is insisted upon by regulators. Must we also insist upon it for those seeking to reanalyse?

To do so would require such persons to do two things. First, they would have to register the analysis plan before being granted access to the data. Second, they would have to promise to make the analysis results available, otherwise we will have a problem of missing analyses to go with the problem of missing trials. I think that it is true to say that we are just beginning to feel our way with this. It may be that the chance has been lost and that the whole of clinical research will be ‘world wide webbed’: there will be a mass of information out there but we just don’t know what to believe. Whatever happens the era of privileged statistical analyses by the original data collectors is disappearing fast.

[Ed. note: Links to some earlier related posts by Prof. Senn are:  “Casting Stones” 3/7/13, “Also Smith & Jones” 2/23/13, and “Fooling the Patient: An Unethical Use of Placebo?” 8/2/12 .]


1. PhRMA, EFPIA. Principles for Responsible Clinical Trial Data Sharing. PhRMA; 2013 [cited 2013 31 August]; Available from:

2. Senn SJ. Statistical quality in analysing clinical trials. Good Clinical Practice Journal. [Research Paper]. 2000;7(6):22-6.

3. Senn SJ. Authorship of drug industry trials. Pharm Stat. [Editorial]. 2002;1:5-7.

4. Senn SJ. Sharp tongues and bitter pills. Significance. [Review]. 2006 September 2006;3(3):123-5.

5. Senn SJ. Pharmaphobia: fear and loathing of pharmaceutical research. [pdf] 1997 [updated 31 August 2013; cited 2013 31 August ]; Updated version of paper originally published on PharmInfoNet].

6. Little RJ, D’Agostino R, Cohen ML, Dickersin K, Emerson SS, Farrar JT, et al. The prevention and treatment of missing data in clinical trials. N Engl J Med. 2012 Oct 4;367(14):1355-60.

7. Committee for Medicinal Products for Human Use (CHMP). Guideline on Missing Data in Confirmatory Clinical Trials London: European Medicine Agency; 2010. p. 1-12.

8. Committee for Proprietary Medicinal Products. Points to consider on multiplicity issues in clinical trials. London: European Medicines Evaluation Agency2002.

9. International Conference on Harmonisation. Statistical principles for clinical trials (ICH E9). Statistics in Medicine. 1999;18:1905-42.

10. Senn S, Bretz F. Power and sample size when multiple endpoints are considered. Pharm Stat. 2007 Jul-Sep;6(3):161-70.

11. Platt JR. Strong Inference: Certain systematic methods of scientific thinking may produce much more rapid progress than others. Science. 1964 Oct 16;146(3642):347-53.

Categories: evidence-based policy, science communication, Statistics, Stephen Senn | 6 Comments

Post navigation

6 thoughts on “Stephen Senn: Open Season (guest post)

  1. Dear Stephen: Thanks so much for the guest post. I’m not up on these policy shifts, and I think I’m missing a lot here. I’d love to understand in more detail what it is you’ve been calling for for over 15 years that is now close to being achieved? I mean, congratulations, but maybe I’m not the only one wanting to hear more about your concern regarding the quality of inferences that will result by “enabling many subsequent researchers to analyse the same data”?
    Finally what ever happened to the case you discussed in “casting stones”:

  2. TheCamel

    Releasing synthetic, facsimile data to third parties can mitigate privacy concerns but is problematic as 1) making reliable synthetic data from complex longitudinal trials is still very difficult 2) it doesn’t prevent 3rd parties from significance questing.

    What if the sponsor were to release the data structure (and accompanying data dictionary and missing data summary) of the tables that hold the trial data? 3rd parites could then submit sharp, cogent hypotheses/questions in an analysis plan and the sponsor could test them in a discplined manner (via imposing strong type I error control or full discosure of all analyses performed).

    Would this be enough to conserve privacy but allow for a disciplined 3rd party probing of the data?

    • I’m puzzled about the idea of people assessing a request for data by trying to compute the capabilities of identifying patients, if the data were released. You’ve just created a roomful of people who figured out you’re the one with this condition–a group that presumably would not know this otherwise. Anyway, now that all our medical records are on-line (in the U.S.) and with all the comparative-treatment appraisals required, I can’t see that there’d really be any privacy about such things.

  3. Mark

    Thanks for this, Stephen! I agree with pretty much all you wrote. But, although I’m not suggesting that one should necessarily trust the results as they’re presented in a final drug report, I’m concerned that many people analysing the same data will contribute nothing but useless or even potentially harmful noise. One thing I’m thinking about is interim analyses…. The sponsor’s final inference adjusts the significance level for interim testing. But what would make a secondary analyst use the same (potentially complicated) approach, retrospectively? This is linked to the discussion of the likelihood principle. I don’t think that opening pharma’s data up for secondary analysis is the way forward (it’s just more “big data” nonsense).

  4. I read this today in relation to one of the biotech stocks I follow:
    “A significant question posed is whether patients should have access to data generated from a device implanted in their own body, and if so, would that data prove beneficial in the doctor-patient relationship and treatment, and the overall health of the patient themselves?”

I welcome constructive comments for 14-21 days. If you wish to have a comment of yours removed during that time, send me an e-mail. If readers have already replied to the comment, you may be asked to replace it to retain comprehension.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Blog at