Stephen Senn: Randomization, ratios and rationality: rescuing the randomized clinical trial from its critics


Stephen Senn
Head of Competence Center for Methodology and Statistics (CCMS)
Luxembourg Institute of Health

This post first appeared here. An issue sometimes raised about randomized clinical trials is the problem of indefinitely many confounders. This, for example is what John Worrall has to say:

Even if there is only a small probability that an individual factor is unbalanced, given that there are indefinitely many possible confounding factors, then it would seem to follow that the probability that there is some factor on which the two groups are unbalanced (when remember randomly constructed) might for all anyone knows be high. (Worrall J. What evidence is evidence-based medicine? Philosophy of Science 2002; 69: S316-S330: see p. S324 )

It seems to me, however, that this overlooks four matters. The first is that it is not indefinitely many variables we are interested in but only one, albeit one we can’t measure perfectly. This variable can be called ‘outcome’. We wish to see to what extent the difference observed in outcome between groups is compatible with the idea that chance alone explains it. The indefinitely many covariates can help us predict outcome but they are only of interest to the extent that they do so. However, although we can’t measure the difference we would have seen in outcome between groups in the absence of treatment, we can measure how much it varies within groups (where the variation cannot be due to differences between treatments). Thus we can say a great deal about random variation to the extent that group membership is indeed random.

The second point is that in the absence of a treatment effect, where randomization has taken place, the statistical theory predicts probabilistically how the variation in outcome between groups relates to the variation within.The third point, strongly related to the other two, is that statistical inference in clinical trials proceeds using ratios. The F statistic produced from Fisher’s famous analysis of variance is the ratio of the variance between to the variance within and calculated using observed outcomes. (The ratio form is due to Snedecor but Fisher’s approach using semi-differences of natural logarithms is equivalent.) The critics of randomization are talking about the effect of the unmeasured covariates on the numerator of this ratio. However, any factor that could be imbalanced between groups could vary strongly within and thus while the numerator would be affected, so would the denominator. Any Bayesian will soon come to the conclusion that, given randomization, coherence imposes strong constraints on the degree to which one expects an unknown something to inflate the numerator (which implies not only differing between groups but also, coincidentally, having predictive strength) but not the denominator.

The final point is that statistical inferences are probabilistic: either about statistics in the frequentist mode or about parameters in the Bayesian mode. Many strong predictors varying from patient to patient will tend to inflate the variance within groups; this will be reflected in due turn in wider confidence intervals for the estimated treatment effect. It is not enough to attack the estimate. Being a statistician means never having to say you are certain. It is not the estimate that has to be attacked to prove a statistician a liar, it is the certainty with which the estimate has been expressed. We don’t call a man a liar who claims that with probability one half you will get one head in two tosses of a coin just because you might get two tails.

Categories: RCTs, S. Senn, Statistics | Tags: ,

Post navigation

6 thoughts on “Stephen Senn: Randomization, ratios and rationality: rescuing the randomized clinical trial from its critics

  1. I thank Senn for this post, first appearing 3 years ago. Readers might go back to check the discussion from

  2. Stephen: In your Stat Issues in Drug Dev (34) book you mention that many Bayesians regard randomization as irrelevant, despite some exceptions. As a philosopher, naturally I’ve been surrounded by Bayesians or those who have been influenced by them to reject randomization. Even at a conference on personalized medicine last year, the philosophers were doing their number against the “gold standard” which surprised me a little because there’s a field where I thought it was agreed that a decade or more was lost for not having randomized microassays.
    The arguments I’ve heard (for Bayesians defending randomization) have to do with controlling bias in priors or being more convincing in a group, as opposed to in private research. Clearly, its role in validating significance tests isn’t high on their list, so anyway, I was hoping you could clarify the Bayesian justification, at least among the “prominent exceptions” you have in mind.

  3. Of course if it is necessary to blind an experiment, then randomisation is essential.

    If this is not so, we can consider two different cases.

    1) The Bayesian has to analyse data that have been collected by somebody else
    2) The Bayesian has to analyse data that have been collected by themselves

    In the first case I think that the value should be obvious. If randomisation has been employed and if the Bayesian knows this then a) it is not necessary for him or her to make the mind of the experimenter part of the model and b) even although randomisation is not always perfectly optimal it is very rarely bad whereas lots of non-random allocations are terrible. For example, it has been suggested by a Bayesian philosopher that just as good would be to label the two treatments in a trial A and B and get the patient to choose. But any true Bayesian would know that unless one has a perfect prior spike that the probability that patients will choose A is exactly 0.5, such a scheme is less efficient than randomisation.

    So having to allow for possible stupid behaviour on the part of others is the price you pay for analysing non-randomised experiments.

    In the second case it is less obvious. The danger is however that your inferential superego has its noble modelling intentions undermined by its experimenting id. One might be in two minds as to whether this is really a problem but then one might really be in two minds, in which case it is.

    • Stephen: I find it interesting that you don’t mention what I took to be the Bayesian’s main justification for randomization, even if it’s not necessary: to get a more non-subjective prior, one that more heavily weighs the data.

      Here’s the reference, or one of them, I had in mind:
      In his attempt to give a Bayesian justification, Larry Wasserman wrote on his blog: “Without randomization, we can indeed derive a posterior for but it is highly sensitive to the prior. This is just a restatement of the non-identifiability of . With randomization, the posterior is much less sensitive to the prior.”

      • Very likely Larry is basing this on more formal work by Don Rubin on randomization and Bayesian analysis.

        If you think of Bayesian analysis in terms of having a representation for how Nature determined unknown parameter(s) as a probability model that generated the values of those parameters and then a probability model as a representation for generating the observations (data generating model) given those parameters values – you will notice they have to build two rather ambitious representations to say the least.

        This is standard Bayes but with a proper prior that is supposed to reflect background information (educated guesses) on how such parameter values would be generated by Nature (and is in no way a literal model!)

        With randomization to say two groups, one treated and one not, if the treatment has no effect (and the trial is perfectly run) the representations can be exactly the same for both groups. Now additionally you will have to add representations/(prior and data model) for the unknown treatment effect in the treatment group – but stick with no effect for now. Then as things “cancel” out in the comparison (what you are most interested in), it turn out the representations can be wrong in lots of ways that don’t matter much.

        Without randomization, the representations have to be different and reflect pre-existing group differences and there is less tolerance for things that are wrong. An example of something that can be wrong is missing a pre-existing difference that does not effect the outcome but those are things we usually don’t know.

        Now the representations/(prior and data model) for the unknown treatment effect might be very wrong (e.g. an additive effect when its non-additive or varies) but this is also true in a frequentist approach – the 95% CI coverage usually is only close to true when there is no effect. In fact that is one of my diagnostic checks of whether a Bio-statistician actually understands randomized clinical trails – if they think the CI will have 95% coverage for the true treatment effect.

        Keith O’Rourke

  4. Pingback: Randomization | Bayesian Philosophy

Blog at