**Stephen Senn**

Head, Methodology and Statistics Group,

Competence Center for Methodology and Statistics (CCMS),

Luxembourg

At a workshop on randomisation I attended recently I was depressed to hear what I regard as hackneyed untruths treated as if they were important objections. One of these is that of indefinitely many confounders. The argument goes that although randomisation may make it probable that some confounders are reasonably balanced between the arms, since there are indefinitely many of these, the chance that at least some are badly confounded is so great as to make the procedure useless.

This argument is wrong for several related reasons. The first is to do with the fact that the total effect of these indefinitely many confounders is bounded. This means that the argument put forward is analogously false to one in which it were claimed that the infinite series ½, ¼,⅛ …. did not sum to a limit because there were infinitely many terms. The fact is that the outcome value one wishes to analyse poses a limit on the possible influence of the covariates. Suppose that we were able to measure a number of covariates on a set of patients prior to randomisation (in fact this is usually not possible but that does not matter here). Now construct principle components, *C _{1}, C_{2}*… .. based on these covariates. We suppose that each of these predict to a greater or lesser extent the outcome,

*Y*(say). In a linear model we could put coefficients on these components,

*k*… (say). However one is not free to postulate anything at all by way of values for these coefficients, since it has to be the case for any set of

_{1}, k_{2}*m*such coefficients that where

*V*( ) indicates

*variance of*. Thus variation in outcome bounds variation in prediction. This total variation in outcome has to be shared between the predictors and the more predictors you postulate there are, the less

*on average*the influence per predictor.

The second error is to ignore the fact that statistical inference does not proceed on the basis of signal alone but also on noise. It is the ratio of these that is important. If there are indefinitely many predictors then there is no reason to suppose that their influence on the variation between treatment groups will be bigger than their variation within groups and both of these are used to make the inference. Continue reading