As if I wasn’t skeptical enough about personalized predictions based on genomic signatures, Jeff Leek recently had a surprising post about a “A surprisingly tricky issue when using genomic signatures for personalized medicine“. Leek (on his blog Simply Statistics) writes:
My student Prasad Patil has a really nice paper that just came out in Bioinformatics (preprint in case paywalled). The paper is about a surprisingly tricky normalization issue with genomic signatures. Genomic signatures are basically statistical/machine learning functions applied to the measurements for a set of genes to predict how long patients will survive, or how they will respond to therapy. The issue is that usually when building and applying these signatures, people normalize across samples in the training and testing set.
….it turns out that this one simple normalization problem can dramatically change the results of the predictions. In particular, we show that the predictions for the same patient, with the exact same data, can change dramatically if you just change the subpopulations of patients within the testing set.
Here’s an extract from the paper,”Test set bias affects reproducibility of gene signatures“:
Test set bias is a failure of reproducibility of a genomic signature. In other words, the same patient, with the same data and classification algorithm, may be assigned to different clinical groups. A similar failing resulted in the cancellation of clinical trials that used an irreproducible genomic signature to make chemotherapy decisions (Letter (2011)).
This is a reference to the Anil Potti case:
Letter, T. C. (2011). Duke Accepts Potti Resignation; Retraction Process Initiated with Nature Medicine.
But far from the Potti case being some particularly problematic example (see here and here), at least with respect to test set bias, this article makes it appear that test set bias is a threat to be expected much more generally. Going back to the abstract of the paper:
ABSTRACT Motivation: Prior to applying genomic predictors to clinical samples, the genomic data must be properly normalized to ensure that the test set data are comparable to the data upon which the predictor was trained. The most effective normalization methods depend on data from multiple patients. From a biomedical perspective, this implies that predictions for a single patient may change depending on which other patient samples they are normalized with. This test set bias will occur when any cross-sample normalization is used before clinical prediction.
Results: We demonstrate that results from existing gene signatures which rely on normalizing test data may be irreproducible when the patient population changes composition or size using a set of curated, publicly-available breast cancer microarray experiments. As an alternative, we examine the use of gene signatures that rely on ranks from the data and show why signatures using rank-based features can avoid test set bias while maintaining highly accurate classification, even across platforms…..
“The implications of a patient’s classification changing due to test set bias may be important clinically, financially, and legally. … a patient’s classification could affect a treatment or therapy decision. In other cases, an estimation of the patient’s probability of survival may be too optimistic or pessimistic. The fundamental issue is that the patient’s predicted quantity should be fully determined by the patient’s genomic information, and the bias we will explore here is induced completely due to technical steps.”…
“DISCUSSION We found that breast cancer tumor subtype predictions varied for the same patient when the data for that patient were processed using differing numbers of patient sets and patient sets had varying distributions of key characteristics (ER* status). This is undesirable behavior for a prediction algorithm, as the same patient should always be assigned the same prediction assuming their genomic data do not change (6)…
“This raises the question of how similar the test set needs to be to the training data for classifications to be trusted when the test data are normalized.”
Returning to Leeks’ post:
The basic problem is illustrated in this graphic.
This seems like a pretty esoteric statistical issue, but it turns out that this one simple normalization problem can dramatically change the results of the predictions. …
In this plot, Prasad made predictions for the exact same set of patients two times when the patient population varied in ER status composition. As many as 30% of the predictions were different for the same patient with the same data if you just varied who they were being predicted with.
This paper highlights how tricky statistical issues can slow down the process of translating ostensibly really useful genomic signatures into clinical practice and lends even more weight to the idea that precision medicine is a statistical field.
Prasad Patil, Pierre-Olivier Bachant-Winner, Benjamin Haibe-Kains, and Jeffrey T. Leek, “Test set bias affects reproducibility of gene signatures.” Bioinformatics Advance Access published March 18, 2015, CUP.