“Only those samples which fit the model best in cross validation were included” (whistleblower) “I suspect that we likely disagree with what constitutes validation” (Potti and Nevins)

Posted on January 12, 2015 by Mayo

toilet-fireworks-by-stephenthruvegas-on-flickr

more Potti training/validation fireworks

So it turns out there was an internal whistleblower in the Potti scandal at Duke after all (despite denials by the Duke researchers involved ). It was a medical student Brad Perez. It’s in the Jan. 9, 2015 Cancer Letter*. Ever since my first post on Potti last May (part 1), I’ve received various e-mails and phone calls from people wishing to confide their inside scoops and first-hand experiences working with Potti (in a statistical capacity) but I was waiting for some published item. I believe there’s a court case still pending (anyone know?)

Now here we have a great example of something I am increasingly seeing: Challenges to the scientific credentials of data analysis are dismissed as mere differences in statistical philosophies or as understandable disagreements about stringency of data validation.[i] This is further enabled by conceptual fuzziness as to what counts as meaningful replication, validation, legitimate cross-validation.

If so, then statistical philosophy is of crucial practical importance.[ii]

Here’s the bulk of Perez’s memo (my emphasis in bold), followed by an even more remarkable reply from Potti and Nevins.

Bradford Perez Submits His Research Concerns

The Med Student’s Memo

This document was written by Bradford Perez, then a third-year medical student, in late March or early April 2008. Working in the laboratory of Anil Potti, Perez presented what biostatisticians describe as an excellent critique of the flawed methodology employed Duke genomics researchers.

I want to address my concerns about how my research year has been in the lab of Dr. Anil Potti. As a student working in this laboratory, I have raised my serious issues with Dr. Potti and also with Dr. Nevins in order to clarify how I might be mistaken. So far, no sincere effort to address these concerns has been made and my concerns have been labeled a “difference of opinion.” I respectfully disagree. In raising these concerns, I have nothing to gain and much to lose.

In fact, in raising these concerns, I have given up the opportunity to be included as an author on at least 4 manuscripts. I have also given up a Merit Award for a poster presentation at this year’s annual ASCO meeting. I have also sacrificed 7 months of my own hard work and relationships that would likely have helped to further my career. Making this decision will make it more difficult for me to gain a residency position in radiation oncology. …

I joined the Potti lab in late August of last year and I cannot tell you how excited I was to have the opportunity to work in a lab that was making so much progress in oncology. The work in laboratory uses computer models to make predictions of individual cancer patient’s prognosis and sensitivity to currently available chemotherapies. It also works to better understand tumor biology by predicting likelihood of cancer pathway deregulation. Over the course of the last 7 months, I have worked with feverish effort to learn as much as possible regarding the application of genomic technology to clinical decision making in oncology. As soon as I joined the lab, we started laying the ground work for my own first author publication submitted to the Journal of Clinical Oncology and I found myself (as most students do) often having questions about the best way to proceed. The publication involved applying previously developed predictors to a large number of lung tumor samples from which RNA had been extracted and analyzed to measure gene expression. Our analysis for this project was centered on looking at differences in characteristics of tumor biology and chemosensitivity between males and females with lung cancer. I felt lucky to have a mentor who was there in the lab with me to teach me how to replicate previous success. I believed the daily advice on how to proceed was a blessing and it was helping me to move forward in my work at an amazingly fast rate. As we were finishing up the publication and began writing the manuscript, I discovered the lack of interest in including the details of our analysis. I wondered why it was so important not to include exactly how we performed our analysis. I trusted my mentor because I was constantly reminded that he had done this before and I didn’t know how things worked. We submitted our manuscript with a short, edited methods section and lack of any real description for how we performed our analysis. I felt relieved to be done with the project, but I found myself concerned regarding why there had been such a pushback to include the details of how we performed our analysis. An updated look at previous papers published before I joined the lab showed me that others were also concerned with the methods of our lab’s previous analyses. This in conjunction with my mentor’s desire to not include the details of our analysis was very concerning. I received my own paper back with comments from the editor and 4 reviewers. These reviewers shared some criticisms regarding our findings and were concerned about the lack of even the option to reproduce our findings since we had included none of the predictors, software, or instructions regarding how we performed this analysis. The implication in the paper was that the study was reproducible using publicly available datasets and previously published predictors even though this was not the case. While I still maintained respect for my mentor’s experience, I felt strongly that we needed to include all the details. Ultimately, I decided that I was not comfortable resubmitting the manuscript even with a completely transparent methods section because I believe that we have no way of knowing whether the predictors I was applying were meaningful. In addition to the red flags with regard to lack of transparency that I mentioned already, I would like to share some of the reasons that I find myself very uncomfortable with the work being done in the lab.

When I returned from the holidays after submitting my manuscript, I started work on a new project to develop a radiation sensitivity predictor using methods similar to those previously developed. I realized for the first time how hard it was to actually meet with success in developing my own prediction model. No preplanned method of separation into distinct phenotypes worked very well. After two weeks of fruitless efforts, my mentor encouraged me to turn things over to someone else in the lab and let them develop the predictor for me. I was gladly ready to hand off my frustration with the project but later learned methods of predictor development to be flawed. Fifty-nine cell line samples with mRNA expression data from NCI-60 with associated radiation sensitivity were split in half to designate sensitive and resistant phenotypes. Then in developing the model, only those samples which fit the model best in cross validation were included. Over half of the original samples were removed. It is very possible that using these methods two samples with very little if any difference in radiation sensitivity could be in separate phenotypic categories. This was an incredibly biased approach which does little more than give the appearance of a successful cross validation. While this predictor has not been published yet, it was another red flag to me that inappropriate methods of predictor development were being implemented.

After this troubling experience, I looked to other predictors which have been developed to learn if in any other circumstances samples were removed for no other reason than that they did not fit the model in cross validation. Other predictors of chemosensitivity were developed by removing samples which did not fit the cross validation results. At times, almost half of the original samples intended to be used in the model are removed. Once again, this is an incredibly biased approach which does little more than give the appearance of a successful cross validation. These predictors are then applied to unknown samples and statements are made about those unknowns despite the fact that in some cases no independent validation at all has been performed.

A closer look at some of the other methods used m the development of the predictors is also concerning. Applying prior multiple T-tests to specifically filter data being used to develop a predictor is an inappropriate use of the technology as it biases the cross validation to be extremely successful when the T-tests are performed only once before development begins. This bias is so great, that accuracy exceeding 90% can be achieved with random samples. ….

My efforts in the lab have led me to have concerns about the robustness of these prediction models in different situations. Over time, different versions of software which apply these predictors have been developed. In using some of the different versions of software, I found that my results were drastically different despite the fact that I bad been previously told that the different versions of the classifier code yielded almost exactly the same results. The results from the different versions are so drastically different that it is impossible for all versions to be accurate. Publications using different versions have been published and predictions are claimed to be accurate in all circumstances. If a predictor is being applied in a descriptive study or in a clinical for any reason, it should be confined that the version of software that is being used to apply that predictor yields accurate predictions in independent validation.

….Some other predictors which have been developed in the lab claim to predict likelihood of tumor biology deregulation. The publication which reports the development of these predictors was recently accepted for publication in JAMA. The cancer biology predictors were developed by taking gene lists from prominent papers in the literature and using them to generate signatures of tumor biology/microenvironment deregulation. The problem is in the methods used to generate those predictors. A dataset consisting of a conglomerate of cancer cell lines (which we refer to as IJC) was used for each predictor’s development an in-house program, Filemerger, was used to bring the gene list of the IJC down to include only the relevant genes for a given predictor. At that point, samples were sorted using hierarchical clustering and then removed one by one and reclustered at each step until two distinct clusters of expression were shown. This step in and of itself biases the model to work successfully in cross validation although an argument could be made that this is acceptable because the gene list is already known to be relevant. The decision regarding how to identify one group of samples as properly regulated and the other as deregulated is where the methods become unclear. There is no way to know if the phenotypes were assigned appropriately, backwards, or if the two groups accurately represent the two phenotypes in question at all.….

After an earlier publication which claimed to make extremely accurate predictions of chemosensitivity (Potti et al., Nature Medicine, 2006), I think that it was assumed that It was easy to generate predictors. More recent events have shown that the methods were more complicated and perhaps different than first described. Given the number of errors that have already been found and the contradicting methods for this paper that have been reported, I think it would be worthwhile to attempt to replicate all the findings of that paper (including methods for development AND claimed validations) in an independent manner. More recently, when we’ve met with trouble in predictor development we’ve resorted to applying prior multiple T-tests or simply removing multiple samples from the initial set of phenotypes as we find that they don’t fit the cross validation model. These methods which bias the accuracy of the cross validation are not clearly (if at all) reported in publications and in most situations the accuracy of the cross validation is being used as at least one measure of the validity of a given model. ….

At this point, I believe that the situation is serious enough that all further analysis should be stopped to evaluate what is known about each predictor and it should be reconsidered which are appropriate to continue using and wonder what circumstances. By continuing to work in this manner, we are doing a great disservice ourselves, to the field of genomic medicine, and to our patients. I would argue that at this point nothing that should be taken for granted. All claims of predictor validations should be independently and blindly performed. ….

I have had concerns for a while; however I waited to be absolutely certain that they were grounded before bringing them forward. As I learn more and more about how analysis is performed in our lab, the stress of knowing these problems exist is overwhelming. Once again, I have nothing to gain by raising these concerns. In fact, I have already lost. …

Read the full memo by Perez here.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In their remarkable letter (in reply to Perez) below, Potti and Nevins admit that they had conveniently removed data points that disagreed with their model. In their view, however, the cherry-picked data that do support their model give grounds for ignoring the anomalies. Since the model checks out in the cases it checks out, it is reasonable to ignore those annoying anomalous cases that refuse to get in line with their model.[ii]

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Extracts from:
“Nevins and Potti Respond To Perez’s Questions and Worries” (the full letter is here)

Dear Brad,

We regret the fact that you have decided to terminate your fellowship in the group here and that your research experience did not tum out in a way that you found to be positive. We also appreciate your concerns about the nature of the work and the approaches taken to the problems. While we disagree with some of the measures you suggest should be taken to address the issues raised, we do recognize that there are some areas of the work that were less than perfect and need to be rectified.

…….. I suspect that we likely disagree with what constitutes validation…..

We recognize that you are concerned about some of the methods used to develop predictors. As we have discussed, the reality is that there are often challenges in generating a predictor that necessitates trying various methods to explore the potential. Clearly, some instances arc very straightforward such as the pathway predictors since we have complete control of the characteristics of the training samples. But, other instances are not so clear and require various approaches to explore the potential of creating a useful signature including in some cases using information from initial cross validations to select samples. If that was all that was done in each instance, there is certainly a danger of overfitting and getting overly optimistic prediction results. We have tried in all instances to make use of independent samples for validation of which then puts the predictor to a real test. This has been done in most such cases but we do recognize that there are a few instances where there was no such opportunity. It was our judgment that since the methods used were essentially the same as in other cases that were validated, that it was then reasonable move forward. You clearly disagree and we respect that view but we do believe that our approach is reasonable as a method of investigation.

……We don’t ask you to condone an approach that you disagree with but do hope that you can understand that others might have a different point of view that is not necessarily wrong.

Finally, we would like to once again say that we regret this circumstance. We wish that this would have worked out differently but at this point, it is important to move forward.

Sincerely yours,

Joseph Nevins

Anil Potti

A Timeline of The Duke Scandal

The Cancer Letter’s Previous Coverage

[i] I have recently gotten letters from people who say that any attempt to improve on statistical methodology or to critically evaluate–in a serious manner– people’s abuses of statistical concepts, is an utter waste of time and tantamount to philosophical navel gazing. Why? Because everyone knows, according to them, that statistics is just so much window dressing and that political/economic expediency is what drives kicking data into line to reach their pre-ordained conclusions. On this view, criticizing Potti and Nevins falls under the navel-gazing umbrella. I wonder how these people would feel were they the ones who signed up for personalized trials based on Potti and Nevins model.

Now here’s my reply: If you want to come out as a social constructivist (it’s all a matter of social negotiation), data nihilist, dadaist, irrationalist, fine. But if you put up your shingle purporting to be a statistical advisor or reformer, as someone who deserves to criticize other people’s interpretations of tests, as one who might issue a ‘friend of the court’ brief to the Supreme court on interpreting statistical significance tests–rather than reveal your data nihilism–then you’re being dishonest, misleading, and acting in a fraudulent manner. See my response to a comment by Deirdre McCloskey.

[ii]The model, to them, seems plausible; besides, they’re trying for a patent, and it’s only going to be the basis of your very own “personalized” cancer treatment!

*The publisher of the Cancer Letter is Paul Goldberg.

Categories: evidence-based policy, junk science, PhilStat/Med, Statistics | Tags: Potti scandal | 28 Comments

28 thoughts on ““Only those samples which fit the model best in cross validation were included” (whistleblower) “I suspect that we likely disagree with what constitutes validation” (Potti and Nevins)”

January 13, 2015

Steven McKinney

Kudos to Paul Goldberg and team at The Cancer Letter for once again finding and reporting on the internal shenanigans at Duke. These letters and emails establish that Duke personnel lied to the Institute of Medicine committee that reviewed this issue. Expect more reporting on such issues as the class-action suit brought by patients enrolled into the sham Duke metagene clinical “trials” continues.

This is why the concept of severe testing is so important. “Ground-breaking” studies, rushed into journals to get “the scoop”, are all too often not severely tested, and should rarely be taken seriously. Reproducible results should be far more important than they currently are, and studies reproducing apparently useful findings should be given high priority for publication by journal editors. The cherry-picking of data to yield better “validation” outcomes also shows why sharing of research data in published papers should be enforced.

Brad Perez made a hard but very smart decision – demand that his name be taken off of shoddy papers, and extricate himself from a poorly run lab practicing shoddy statistics. His reasoning and courageous decision should become text-book teaching in all basic science classes taught in high schools and undergraduate university programs. How to stand up for yourself in the presence of overbearing scientific “leaders” is essential knowledge for those wanting to embark on a scientific career.

Reply

January 13, 2015

Mayo

Steven: Well it’s good that there are still some people-like Perez and you– who think statistical analysis had better be more than mere window dressing. Here there’s a method–cross validation–for which there are correct and terribly wrong ways to use. Surely Potti and Nevins knew what they were doing was a violation, not an issue about which reasonable stat people can disagree? I’m really struck by their letter which makes them culpable for defending their method. Nevins had pretended he knew nothing about Potti’s finagling.
This helps me understand some of the calls I was getting. How did this escape the IOM?

Reply

January 16, 2015

Steven McKinney

This escaped the IOM because they asked Duke representatives if there had been any internal dissent, and the Duke representatives lied and said no. Unlike the NCI, the IOM review committee had no legal mandate, no authority to arrest or charge anyone or subpoena documents. So they asked an appropriate question, and received an inappropriate answer. It’s all there in the records now for us to see in hindsight.

From the Jan 9 2015 issue of The Cancer Letter: ” It’s not a crime to give deceptive testimony to IOM. ‘They are a private club,’ said NYU’s Caplan, who is an IOM member. ‘You can lie to them all you want. It’s like lying to The Cancer Letter. It’s probably bad form, but you are not going to go to jail.’ “

Reply

January 17, 2015

Mayo

Do you think this is Potti’s website?
http://anilpotti.com

Reply

January 13, 2015

Christian Hennig

Re footnote (1): I don’t think that bashing social constructivism is justified here. I haven’t seen social constructivists defending Potti’s practices, and certainly social constructivism is not nihilism. Particularly, according to my understanding, as a social constructivist you’d want to see transparency in science in order to have a proper “negotiation” (how you’d call it), and should therefore support Perez’s position, as you do, and for very similar reasons. (The people of whom I learnt social constructivism actually used such kinds of statistical cheating as an argument against blind trust in scientific publications; you may disagree with their conclusions, but they’ll agree with you on much that goes wrong in science.)

Reply

January 13, 2015

Mayo

Christian: The social constructivists likely wriggle in and out of various descriptions, especially now that postmodernism is regarded as a huge embarrassment. But their view that it’s all a matter of social negotiation is a direct denial of the philosophy of severe testing. The fact that it’s open to any of them to decide to adopt that position if they feel like using it in negotiations in no way makes them objective. It’s true that social constructivist types distrust science but not because they are condemning those scientists who are too flexible or who are sellouts BY CONTRAST to what they ought to be. They think it’s preordained, or that there is no sounder way to go. To criticize all science/statistics as just so much social negotiation–not because it happens to be that way at some point in time and ought to be improved, but because you think it cannot be any other way, is not at all in sync with my position. It’s the opposite, and even more dangerous than self-proclaimed data nihilisms. Taking that view to its extreme, we should simply not bother collecting data, disagreement is all about interests, and whoever’s got the political clout. The self-proclaimed nihilists and anarchists are, in a sense, more honest, and certainly less capable of bamboozling others. The critics like Ziliac and McCloskey cannot tell us how to do a better job at inference and knowledge, since they think it’s all a sham. The idea of getting clear on the meaning of a statistical concept they are raking in money and fame by bashing is a big joke to them. As it would be, if it’s all a schtick. But by pretending to hold a critical reformers position, they mislead people into thinking they too just want to promote more stringently warranted claims, better statistical tests, etc.

The idea of supporting Perez because that happens to be a useful negotiating standpoint today, now that Potti and Nevins have been outed, is, to me disgusting and the lowest of the low.

Going back to my shifting meanings claim,I grant that there can also be constructivists who merely are holding the trivially true fact: all human concepts, theories, practices are “constructed” by humans. Within this verbal constructivism, one could still mount a severe critical view, but then why waste time labeling yourself in order to proclaim a truism? I guess at one time it was enlightening to explore how we construct standards, theories, methods, etc.

Reply

January 14, 2015

Christian Hennig

Mayo: Can you give some references to social constructivists, where they make the kind of statement you are going on about here?

Reply

January 14, 2015

Mayo

Christian: Oh please, the majority of “science studies” , STS, folks adhere to a variation on this, least radical, umbrella, though of course it’s part of their philosophy to wriggle in and out of umbrellas. For a start on your interest in getting names (not mine), try Steve Fuller (http://www.talkreason.org/articles/Fuller.cfm)
social epistemologists, feminist epistemology, social epistemology, Feyerabend, Derrida and French postmodernists, etc.I have no interest in spending time getting you names.

Reply

January 15, 2015

Christian Hennig

I know Feyerabend quite well and a number of others (Hacking by the way wrote that Kuhn is much more of a social constructivist than Feyerabend), but hadn’t heard of Fuller yet. Anyway, I didn’t ask this to get general recommendations on social constructivism, but rather to see some specific bits written by social constructivists that illustrate your point of view of them. The Fuller link is not really Fuller but “somebody who doesn’t like Fuller writing about Fuller”. Anyway, I can try to find something on my own.

Let me just say that I think that “it’s all social negotiation” alone doesn’t really have implications on how science is to be done. What we do here is social negotiation, and within social negotiation we exchange our ideas about what science is good for and how it should proceed (or not). That it’s negotiation itself doesn’t tell us which ideas are good and which are bad; and I don’t see any contradiction whatsoever between being a constructivist and acknowledging the meaning and use of severe testing in science (the constructivists, though, may have a more pluralist range of techniques and arguments they can appreciate).

Reply

January 15, 2015

Mayo

Christian:
“Hacking by the way wrote that Kuhn is much more of a social constructivist than Feyerabend”.
I don’t know whether he was using “social constructivist” to mean a position less extreme than “anarchist”. If so, it would make sense, as Kuhn was only a soft relativist. Yet Fuller’s position, whatever you want to call it, is akin to Feyerabend–you choose your “way of knowing”, be it crystal gazing, or voodoo, and no school or institution can say which is a more reliable form of knowledge or which your child should be taught. Surprised you haven’t heard of Fuller–he was a colleague, but in STS, at Virginia Tech.

“The Fuller link is not really Fuller”–True, but don’t you know who the author is? Remember the “science wars”?

” I don’t see any contradiction whatsoever between being a constructivist and acknowledging the meaning and use of severe testing in science (the constructivists, though, may have a more pluralist range of techniques and arguments they can appreciate).”

Well, I’m not very impressed with social constructivists capable of “acknowledging the meaning and use of severe testing”, especially as they acknowledge the meaning and use of voodoo, fantasy science, data nihilism (a la Stapel). Now I also acknowledge, with the social constructivists, that there are idiosyncratic factors of personality, economics, axes to grind, self-interests, greedy ambition, politics, sexism, arrogance and what have you, that enter into the scientists’ life and which can influence their scientific lives. Either they are irrelevant to the knowledge job of science or that ought to be. That’s why we need methods to find things out DESPITE those potential biases. The social constructivists hold that those idiosyncratic factors of personality and politics are all that really matter, and even claim they understand what science is about better than the scientist by attending to such matters. Well they pick up aspects of the sociology of science–but they see no difference between that and anything normative.
My view is that this was at the heart of the break between science studies and philosophy of science–for the most part.
So, yeah, I guess you can say it’s just my ideology or my private personality problem to declare that throwing out the data that fail to fit Potti’s pet model is bad science, no evidence, fraud, and to insist on objective scrutiny: his model “passes” a highly insevere test (one with very little if any capacity to find its own flaws). I’m with Feynman who said scientists should bend over backwards to find ways they can be wrong. If social constructivists, relativists and the like are allowed to apply their ways to clinical trials, then they ought to bend over forwards and be booted out!

Reply

January 16, 2015

Christian Hennig

“The social constructivists hold that those idiosyncratic factors of personality and politics are all that really matter, and even claim they understand what science is about better than the scientist by attending to such matters.”
My understanding of this always was (from those social constructivists I know, obviously), that they see it as their job to look at those factors and to find out and expose what their influence on science is, which I have always found useful. However, regarding what matters in science and how to distinguish good and bad science is the business of scientists (and people who care about science) and not of social constructivists. Social constructivism is not normative about science, I agree, but that’s not implying that they state positively that there shouldn’t be any norms negotiated by scientists (Feyerabend, by the way, emphasizes that his slogan “anything goes” does not summarize what he thinks how science should be done, but rather what he thinks are the implications of the attempts by others to come up with a set of rules how science is to be done). So there is no contradiction between discussing such norms and being a social constructivist (one may hold that social constructivism has nothing much to contribute to such a discussion). Neither is there a contradiction between being a social constructivist and holding that intelligent design is nonsense according to scientific norms and values, to which this specific social constructivist subscribes. (Just as an example to invoke the Fuller debate; personally I don’t have much interest in the intelligent design theory and am certainly not an expert.)

Now I grant one possibility, which is that I just tend to take from the social constructivists what I like and to ignore what I don’t like in order to build my own view, and that I shouldn’t defend them too much as a group because I may accidentally end up defending things that I really don’t like because they conflict with my values (which is why I asked for references). Certainly I’d not defend any constructivist claiming that the kind of methodological discussion and discussions about norms of science we have in this blog are pointless, or that exposing the kind of practices by Potti etc. really doesn’t matter; except that I’m not aware of any of them who explicitly claimed this (OK, Feyerabend can be seen as coming dangerously close but see Hacking).
Anyway, I perceive “taking from them what fits into my personal world view, ethics and values” as constructive…

Reply

January 16, 2015

phaneron0

> However, regarding what matters in science and how to distinguish good and bad science is the business of scientists (and people who care about science) and not of social constructivists.

That is what I would think of as soft relativism, we can’t get outside our conceptualisations but we have to keep trying to represent reality less wrongly and hope we would succeed if we kept trying.

I do think folks can slip easily from soft relativism into hard relativism (Richard Rorty apparently, writing something like norms of science are no more than therapeutic in the same publication where he earlier claimed to only be a soft relativist) and labels don’t fully reflect this.

But for Hacking, I did have a conversation with him when I was at the University of Toronto (199?) where he argued that one should not worry about whether they have the most appropriate perspective when they address a question but rather just jump in and see what they could bring to bear on the question. It was not that all perspectives were equal but rather it was hard to rule one out that would signal some discordance with reality. Might be my misinterpretation, but I have found the advice helpful.

Keith

Reply

January 17, 2015

Steven McKinney

“But for Hacking, I did have a conversation with him when I was at the University of Toronto (199?) where he argued that one should not worry about whether they have the most appropriate perspective when they address a question but rather just jump in and see what they could bring to bear on the question. It was not that all perspectives were equal but rather it was hard to rule one out that would signal some discordance with reality. Might be my misinterpretation, but I have found the advice helpful.”

I submit that if you stand at the edge of a cliff and adopt a social constructivist attitude, you will soon find yourself wedged between a rock and a hard place, if you live. It’s one thing to argue that in areas of social mores and religion that science and statistics have little value, but in the natural realm there are laws which have severe consequences should you misjudge them. The effective use of the scientific method and statistics in natural arenas could save your life, and that includes a clinical trial.

The poor use of statistics in these Duke clinical trials in choosing medical treatments, had life and death consequences which are now being hashed out in a court of law.

So there are some arenas in which one should worry about whether one has an appropriate perspective on a question before one jumps in, and statisticians working with geneticists at Duke should have worried more. Before statisticians claimed that a certain Bayesian analysis was not subject to the issue of overfitting to a data set, they should have tested the claim severely before they sat idly by while a clinician, based on their analytical methodology, chose drugs to foist upon unsuspecting patients. Brad Perez worried – what happened to him is just amazing.

Reply

January 17, 2015

phaneron0

Steven:

I the context of the Hacking conversation, it was implicit that some folks with the appropriate background would be involved – or more likely it was hoped that would be the case. That is the issue – we never know for sure what the critical background might be or who has it.

Now if there is documentation that Poti understood the issue of overfitting and purposely mislead others in that regard – that would be reason, I believe, for criminal charges.

I know that when I was at Duke (2007/8) there was at least one member of the Stats department who argued overfitting was not a problem. There was a survey of clinical trialists in the US (early 2000) and 50% thought switching the primary out come was not enough of big deal to even report they had done that. This was before An-Wen Chan’s paper empirically documenting how bad it was in practice that those views starting to change.

Not everyone understands what we think we understand!

Keith
January 17, 2015

Mayo

What was An-Wen Chan’s paper?

January 17, 2015

Mayo

Yes, Rorty’s another one: a flash in the pan.

Reply

January 14, 2015

Keith O'Rourke

> but they’ll agree with you on much that goes wrong in science.
OK, but would they agree on anything that goes right in science beyond being in accord with some fashion?

We cannot step outside our conceptual schemes to perceive reality but we can’t ignore reality because of this.

Mayo:
> now that postmodernism is regarded as a huge embarrassment.
Just recently? Or does this recur every 20 or 40 years (one or two academic generations).

Now, I don’t think it is that unusual for clinical researchers to not actually understand practices that make results unreliable nor for senior research managers to countenance resignations and firings of research students who quarrel with their supervisors.

For instance, I spent about 10 years trying to convince 6 out of 8 teaching hospital research groups to consider adopting double data entry or other data quality measures like the 2 other teaching hospitals with zero impact (all in the same city).

They argued it was poor use of scarce research funds and that the data errors should cancel out.I even had case studies where their data sets that we had access to were re-entered from charts and the corrected data changed the conclusions dramatically. Here they simply quoted other experts that had voiced the same opinions about it not being really necessary.

Reply

January 14, 2015

Mayo

Keith: I don’t recall saying that it is “unusual for clinical researchers to not actually understand practices that make results unreliable” or to disagree on methodology. But I have to say I’m astonished that, faced with an ernest student saying “but you’ve thrown out half the data” that they could seriously write, “well you know we disagree with you, it doesn’t mean we’re wrong, etc”.

On the 20 or 40 year cycle for radical relativistic movements, I can’t say–I haven’t been around that long. Clearly, here, at least in the case of post-modernism there is a specific negation of logic/science/reason as having any special purchase beyond a kind of “way of knowing,” “lifestyle choice”, political power play, what have you.

Reply

January 15, 2015

Keith O'Rourke

> I’m astonished that, faced with an ernest student saying

I’m not as I was involved in a similar case where the 3rd year medical student was fired (and therefore had to repeat 3rd year, the second time choosing not to do research) and the person who fired them later received a prestigious award noting their excellent work mentoring young clinical researchers. The student had contacted my director and I to help sort out the disagreement but had been too forthright with the investigator thereby providing them with the _excuse_ to fire them. The student was clearly right and the investigator wrong (some data was being thrown out) but at least with some reflection I don’t think the investigator knew they were wrong. They may have suspected they were a little bit wrong but unlikely clearly wrong.

Here is perhaps enlightening case. Avan Feinstien was mentor to many clinical researchers and clinical research mentors http://en.wikipedia.org/wiki/Alvan_Feinstein. I worked with a number of his graduates. They had this one particularly interesting research project they had been working on for a few years. The control group from a very large cardiovascular study was randomly split up into many fake two group RCTs (were the NULL was exactly true) and the comprehensive set of well understood covariates was modelled to show that (this was their hypothesis) the type one error could be reduced to zero. They kept on being disappointed at it always being around .05! (After numerous meetings I think I convinced one of his graduates that the type one error rate was chosen/based on whatever residual variation survived the covariate adjustments. Not sure I was successful.)

It is easy to forgot how hard it is for folks to get how science and statistics actually works (e.g. what percentage of folks think there should be no advantage to switching when they first hear the Monte Hall problem? How many experience discomfort in struggling to understand the opposite?)

People learn to cope, hide their lack of understanding and muddle through (just like most of us).

Reply

January 15, 2015

Mayo

Keith: In this case, they were actively assigning the results of this mess to patients! To people! At best, I spoze some may argue that your assignment to chemo is a crapshoot anyway. But they were actively promoting their model for a patent and to revolutionize cancer therapy. I wonder how they expected others to actually apply it? (Step 2, now throw out what doesn’t fit?)

Reply

January 15, 2015

phaneron0

Mayo: I am just relating my past experience working with various clinical researchers. There were some good ones that often were able to prevent the mess from determining patient care – but not always. Not sure why we should be surprised, at most a one term course in stats (often taught by someone who has had a few good stats courses) in medical school and then working with other clinical researchers that have little access to experienced statisticians. It has (is) improved(ing) somewhat.

But here is an example you might like. A statistician, after 2 – 3 years working with a clinical research group, after getting biostats Phd from Harvard, analyses a small study with almost surely very small power. Its statistically significant. They advise the primary investigator, the power was small but given the result was significant the true effect must be humongous and the primary investigator starts going around the university talking about it. My director heard about this and asked me to attend this groups next research rounds (but did not tell me about their conclusion which his suspected was questionable.) The statistician actually used the term humungous in their presentation.

Keith

Reply

January 15, 2015

Christian Hennig

Keith: “OK, but would they agree on anything that goes right in science beyond being in accord with some fashion?

We cannot step outside our conceptual schemes to perceive reality but we can’t ignore reality because of this. ”

One problem with social constructivism is that as a label this is used in very different ways, partly by people to characterise their own position but partly by others to ridicule it. In principle, if you are a constructivist, you have to construct your own constructivism, and one constructivist’s constructivism may be another constructivist’s nonsense.

I always though that it was a misunderstanding of (social) constructivism that this implies to “ignore reality”. It makes sense to speak of personal and social realities in constructivism, and I’d think that most constructivists, if they are honest, realize that the idea of a “reality outside” which is shared with others is part of their personal reality. Add to this the idea that the aim of science is to find out about that “reality outside” in a way people can agree about, they should be as well equipped to take part in a serious and constructive manner in scientific discussions as anybody else. They may be more modest about what science can achieve and more prone to hold some skepticism toward even the best bits of science, but this doesn’t seem to me to be a general obstacle against doing good science and criticizing bad science.

The following is my up to now best attempt to make such a point:
Hennig C (2010) Mathematical Models and Reality: A Constructivist Perspective. Foundations of Science 15, 29-48.

Reply

January 15, 2015

Keith O'Rourke

> one constructivist’s constructivism
If your constructed constructivism includes “the idea that the aim of science is to find out about that “reality outside” in a way people can agree about” then I likely won’t have a problem with it.

But, as I am sure you well understand, we all have decide what to spend our time reading and if we have to read each constructed constructivism for each individual …

Reply

January 16, 2015

Christian Hennig

Keith: You can of course read and not read whatever you want. My point is just that it’s wrong to portray social constructivism as implying that reality should be ignored somehow.

Reply

January 14, 2015

Mayo

Readers may find it illuminating to look at the Appendix of Steven McKinney’s letter to the IOM near the end of my first Potti post:

https://errorstatistics.com/2014/05/31/what-have-we-learned-from-the-anil-potti-training-and-test-data-fireworks-part-1/

Referring to a paper by West, Nevins et.al, McKinney writes:

“In the same paragraph, the authors state ‘Note, that if we draw a decision line at a probability of 0.5 we obtain a perfect classification of all 27 tumors. However the analysis uses the true class assignments z1 … z27 of all the tumors. Hence, although the plot demonstrates a good fit of the model to the data it does not give us reliable indications for a good predictive performance. One might suspect that the method just “stores” the given class assignments in the parameter, . Indeed this would be the case if one uses binary regression for n samples and n predictors without the additional restrains introduced by the priors. That this suspicion is unjustified with respect to the Bayesian method can be demonstrated by out-of-sample predictions.'”

He continues: “I believe this is the key flaw in the reasoning behind this statistical analytical method. The authors state without proof (via theoretical derivation or simulation study) that this Bayesian method is somehow immune to the issue of overfitting a model to a set of data.This is the aspect of this analytical paradigm that truly needs a sound statistical evaluation, so that a determination as to the true predictive capacity of this method can be scientifically demonstrated.”

I hope to encourage people to carry out this demonstration.

Reply
January 14, 2015

Mayo

Interested readers will find a wealth of info in this document:

Click to access iom_evolutionoftranslationalomics_2012.pdf

Evolution of Translational Omics: Lessons Learned and the Path Forward

I think we can expect another installment from the Cancer Letter this week. Stay tuned.

Reply
January 16, 2015

Mayo

See Cancer Letter update today:
http://www.cancerletter.com/articles/20150116_1
If no one goes to jail, it would seem that even bank and securities fraud is dealt with more stringently than the treatment of fraud in cancer research.

Reply
January 16, 2015

Mayo

Steven McKinney, on another blog, gave a partial list of Potti-related retractions by statisticians:
http://www.chimici.info/chemistryworld/in-the-pipeline_1165.html

NEJM retracted paper “A Genomic Strategy to Refine Prognosis in Early-Stage Non–Small-Cell Lung Cancer” – statistician: Mike West

PLoS ONE retracted paper “An Integrated Approach to the Prediction of Chemotherapeutic Response in Patients with Breast Cancer” – statistician: William Barry

PNAS retracted paper “A genomic approach to colon cancer risk stratification yields biologic insights into therapeutic opportunities” – statistician: William Barry

JAMA retracted paper “Gene Expression Signatures, Clinicopathological Features, and Individualized Therapy in Breast Cancer” – statistician: William Barry

JCO retracted paper “An Integrated Genomic-Based Approach to Individualized Treatment of Patients With Advanced-Stage Ovarian Cancer” – statistician: Mike West

Human Cancer Biology retracted paper “Characterizing the Clinical Relevance of an Embryonic Stem Cell Phenotype in Lung Adenocarcinoma” – statistician: William Barry

“Other bioinformaticians and genomic scientists who should have had some reasonable statistical training appeared on many of the retracted papers as well. Statistical discipline at Duke for the groups involved with this research effort was indeed lacking.
So 6 of 11 retracted papers (most of them) do show a statistician.
Duke claimed they would investigate their institutional and research practices and clean things up, though I haven’t seen much effort or results. The IOM report contains an “Omics-based test development process” (Figure 4-1) that will be used by responsible reviewers at the NCI and the FDA to better vet proposed methodologies in future, so that universities that won’t clean up their act will have a tougher time pulling off another such fiasco.”

He didn’t mention Potti-Nevins papers. surely they were heading the statistical analysis, weren’t they?

Reply

“Only those samples which fit the model best in cross validation were included” (whistleblower) “I suspect that we likely disagree with what constitutes validation” (Potti and Nevins)

Post navigation

28 thoughts on ““Only those samples which fit the model best in cross validation were included” (whistleblower) “I suspect that we likely disagree with what constitutes validation” (Potti and Nevins)”

Leave a reply to Mayo Cancel reply

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.

“Only those samples which fit the model best in cross validation were included” (whistleblower) “I suspect that we likely disagree with what constitutes validation” (Potti and Nevins)

Related

Post navigation

28 thoughts on ““Only those samples which fit the model best in cross validation were included” (whistleblower) “I suspect that we likely disagree with what constitutes validation” (Potti and Nevins)”

Leave a reply to Mayo Cancel reply

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.