Head of Competence Center for Methodology and Statistics (CCMS)
Luxembourg Institute of Health
This post first appeared here. An issue sometimes raised about randomized clinical trials is the problem of indefinitely many confounders. This, for example is what John Worrall has to say:
Even if there is only a small probability that an individual factor is unbalanced, given that there are indefinitely many possible confounding factors, then it would seem to follow that the probability that there is some factor on which the two groups are unbalanced (when remember randomly constructed) might for all anyone knows be high. (Worrall J. What evidence is evidence-based medicine? Philosophy of Science 2002; 69: S316-S330: see p. S324 )
It seems to me, however, that this overlooks four matters. The first is that it is not indefinitely many variables we are interested in but only one, albeit one we can’t measure perfectly. This variable can be called ‘outcome’. We wish to see to what extent the difference observed in outcome between groups is compatible with the idea that chance alone explains it. The indefinitely many covariates can help us predict outcome but they are only of interest to the extent that they do so. However, although we can’t measure the difference we would have seen in outcome between groups in the absence of treatment, we can measure how much it varies within groups (where the variation cannot be due to differences between treatments). Thus we can say a great deal about random variation to the extent that group membership is indeed random.
The second point is that in the absence of a treatment effect, where randomization has taken place, the statistical theory predicts probabilistically how the variation in outcome between groups relates to the variation within.The third point, strongly related to the other two, is that statistical inference in clinical trials proceeds using ratios. The F statistic produced from Fisher’s famous analysis of variance is the ratio of the variance between to the variance within and calculated using observed outcomes. (The ratio form is due to Snedecor but Fisher’s approach using semi-differences of natural logarithms is equivalent.) The critics of randomization are talking about the effect of the unmeasured covariates on the numerator of this ratio. However, any factor that could be imbalanced between groups could vary strongly within and thus while the numerator would be affected, so would the denominator. Any Bayesian will soon come to the conclusion that, given randomization, coherence imposes strong constraints on the degree to which one expects an unknown something to inflate the numerator (which implies not only differing between groups but also, coincidentally, having predictive strength) but not the denominator.
The final point is that statistical inferences are probabilistic: either about statistics in the frequentist mode or about parameters in the Bayesian mode. Many strong predictors varying from patient to patient will tend to inflate the variance within groups; this will be reflected in due turn in wider confidence intervals for the estimated treatment effect. It is not enough to attack the estimate. Being a statistician means never having to say you are certain. It is not the estimate that has to be attacked to prove a statistician a liar, it is the certainty with which the estimate has been expressed. We don’t call a man a liar who claims that with probability one half you will get one head in two tosses of a coin just because you might get two tails.
I thank Senn for this post, first appearing 3 years ago. Readers might go back to check the discussion from
Stephen: In your Stat Issues in Drug Dev (34) book you mention that many Bayesians regard randomization as irrelevant, despite some exceptions. As a philosopher, naturally I’ve been surrounded by Bayesians or those who have been influenced by them to reject randomization. Even at a conference on personalized medicine last year, the philosophers were doing their number against the “gold standard” which surprised me a little because there’s a field where I thought it was agreed that a decade or more was lost for not having randomized microassays.
The arguments I’ve heard (for Bayesians defending randomization) have to do with controlling bias in priors or being more convincing in a group, as opposed to in private research. Clearly, its role in validating significance tests isn’t high on their list, so anyway, I was hoping you could clarify the Bayesian justification, at least among the “prominent exceptions” you have in mind.
Of course if it is necessary to blind an experiment, then randomisation is essential.
If this is not so, we can consider two different cases.
1) The Bayesian has to analyse data that have been collected by somebody else
2) The Bayesian has to analyse data that have been collected by themselves
In the first case I think that the value should be obvious. If randomisation has been employed and if the Bayesian knows this then a) it is not necessary for him or her to make the mind of the experimenter part of the model and b) even although randomisation is not always perfectly optimal it is very rarely bad whereas lots of non-random allocations are terrible. For example, it has been suggested by a Bayesian philosopher that just as good would be to label the two treatments in a trial A and B and get the patient to choose. But any true Bayesian would know that unless one has a perfect prior spike that the probability that patients will choose A is exactly 0.5, such a scheme is less efficient than randomisation.
So having to allow for possible stupid behaviour on the part of others is the price you pay for analysing non-randomised experiments.
In the second case it is less obvious. The danger is however that your inferential superego has its noble modelling intentions undermined by its experimenting id. One might be in two minds as to whether this is really a problem but then one might really be in two minds, in which case it is.