I wanted to locate some uncluttered lounge space for one of the threads to emerge in comments from 6/14/13. Thanks to Stanley Young for permission to post this.
S. Stanley Young, PhD
Assistant Director for Bioinformatics
National Institute of Statistical Sciences
Research Triangle Park, NC
There is a relatively unknown problem with microarray experiments, in addition to the multiple testing problems. Samples should be randomized over important sources of variation; otherwise p-values may be flawed. Until relatively recently, the microarray samples were not sent through assay equipment in random order. Clinical trial statisticians at GSK insisted that the samples go through assay in random order. Rather amazingly the data became less messy and p-values became more orderly. The story is given here:
Essentially all the microarray data pre-2010 is unreliable. For another example, Mass spec data was analyzed Petrocoin. The samples were not randomized that claims with very small p-values failed to replicate. See K.A. Baggerly et al., “Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments,” Bioinformatics, 20:777-85, 2004. So often the problem is not with p-value technology, but with the design and conduct of the study.
Please check other comments on microarrays from 6/14/13.
Head of the Methodology and Statistics Group,
Competence Center for Methodology and Statistics (CCMS), Luxembourg
An issue sometimes raised about randomized clinical trials is the problem of indefinitely many confounders. This, for example is what John Worrall has to say:
Even if there is only a small probability that an individual factor is unbalanced, given that there are indefinitely many possible confounding factors, then it would seem to follow that the probability that there is some factor on which the two groups are unbalanced (when remember randomly constructed) might for all anyone knows be high. (Worrall J. What evidence is evidence based medicine. Philosophy of Science 2002; 69: S316-S330: see page S324 )
It seems to me, however, that this overlooks four matters. The first is that it is not indefinitely many variables we are interested in but only one, albeit one we can’t measure perfectly. This variable can be called ‘outcome’. We wish to see to what extent the difference observed in outcome between groups is compatible with the idea that chance alone explains it. The indefinitely many covariates can help us predict outcome but they are only of interest to the extent that they do so. However, although we can’t measure the difference we would have seen in outcome between groups in the absence of treatment, we can measure how much it varies within groups (where the variation cannot be due to differences between treatments). Thus we can say a great deal about random variation to the extent that group membership is indeed random.
The second point is that in the absence of a treatment effect, where randomization has taken place, the statistical theory predicts probabilistically how the variation in outcome between groups relates to the variation within. Continue reading