S. Stanley Young: More Trouble with ‘Trouble in the Lab’ (Guest post)

 Stanley Young’s guest post arose in connection with Kepler’s Nov. 13, and my November 9 post,and associated comments.

YoungPhoto2008S. Stanley Young, PhD Assistant Director for Bioinformatics National Institute of Statistical Sciences Research Triangle Park, NC

Much is made by some of the experimental biologists that their art is oh so sophisticated that mere mortals do not have a chance [to successfully replicate]. Bunk. Agriculture replicates all the time. That is why food is so cheap. The world is growing much more on fewer acres now than it did 10 years ago. Materials science is doing remarkable things using mixtures of materials. Take a look at just about any sports equipment. These two areas and many more use statistical methods: design of experiments, randomization, blind reading of results, etc. and these methods replicate, quite well, thank you. Read about Edwards Deming. Experimental biology experiments are typically run by small teams in what is in effect a cottage industry. Herr professor is usually not in the lab. He/she is busy writing grants. A “hands” guy is in the lab. A computer guy does the numbers. No one is checking other workers’ work. It is a cottage industry to produce papers.

There is a famous failure to replicate that appeared in Science.  A pair of non-estrogens was reported to have a strong estrogenic effect. Six labs wrote into Science saying the could not replicate the effect. I think the back story is as follows. The hands guy tested a very large number of pairs of chemicals. The most extreme pair looked unusual. Lab boss said, write it up. Every assay has some variability, so they reported extreme variability as real. Failure to replicate in six labs. Science editors says, what gives. Lab boss goes to hands guy and says run the pair again. No effect. Lab boss accuses hands guy of data fabrication. They did not replicate their own finding before rushing to publish. I asked the lab for the full data set, but they refused to provide the data.  The EPA is still chasing this will of the wisp, environmental estrogens. False positive results with compelling stories can live a very long time. See [i].

Begley and Ellis visited labs. They saw how the work was done. There are instances where something was tried over and over and when it worked “as expected”, it was a rap. Write the paper and move on. I listened to a young researcher say that she tried for 6 months to replicate results of a paper. Informal conversations with scientists support very poor replication.

One can say that the jury is out as there have been few serious attempts to systematically replicate. There is now starting systematic replication. I say less than 50% of experimental biology claims will replicate.

[i]Hormone Hysterics. Tulane University researchers published a 1996 study claiming that combinations of manmade chemicals (pesticides and PCBs) disrupted normal hormonal processes, causing everything from cancer to infertility to attention deficit disorder.

Media, regulators and environmentalists hailed the study as “astonishing.” Indeed it was as it turned out to be fraud, according to an October 2001 report by federal investigators. Though the study was retracted from publication, the law it spawned wasn’t and continues to be enforced by the EPA. Read more…

Categories: evidence-based policy, junk science, Statistical fraudbusting, Statistics | 20 Comments

Post navigation

20 thoughts on “S. Stanley Young: More Trouble with ‘Trouble in the Lab’ (Guest post)

  1. Stan: Thanks for letting me post these informal comments. I agree with you that there’s little mystery about how to do good science. What I’m not understanding is whether you’re saying these guys have little choice because “Herr professor” must be busy writing grants, or whether you’re saying it’s all part of “a cottage industry to produce papers”. Or is it all combined. Notably, the problem, as you describe it here, appears to have little to do with with the typical statistical scapegoat, and largely to do with researchers ignoring experimental design/collection and statistical inference protocols.

    • The incentive is to publish papers. Period. If a journal will accept it, then publish it. It is up to funding agencies and journal editors to understand what is going on and fix any problems.

      • But aren’t they trying to get approval or whatever for the outputs? I’m kind of confused about why this stuff would get published if so half-baked.

        • Mark

          ANYTHING can get published. There are now something like 50,000 health journals. It must be mostly noise, otherwise we’d be seeing many more advances than we are.

          • Mark: so you think it’s the journals? I was reading that with open access journals, and with authors paying to publish, Journals have to keep publishing to stay afloat. Something like that. But that would seem easily fixable.

            • Mark

              No, actually, going back to your cartoon the other day, I think it’s the entire publish or perish culture. As Stan Young points out, it’s hard to blame the “workers” (I.e., the researchers) when they’re just playing within the rules that they’re given. My primary job is to support junior faculty on career development awards (Stan just came to talk to our group a few weeks ago, in fact). It is pretty much driven into the heads of these poor young faculty members that the sheer number of pubs they have is key to getting new grants and promotion and tenure. So, they write like crazy (some of them) and shop the manuscripts around from journal to journal until they get a hit. It’s simply insane, and I’d say, unfortunately, the large bulk of it isn’t great or even very good science (I read A LOT of these manuscripts).

              But, that aside, I know that Stan Young tends to focus on multiplicity issues. Personally, I think that the far bigger problem in non-randomized health research is bias, primarily unmeasured confounding, that pretty much nullifies any reasonable hypothesis testing anyway. There seems to be the assumption that controlling for “known confounders” is necessarily a good thing, that it can only improve your inference. That’s simply incorrect…. controlling for known confounders can actually increase net bias if there are unmeasured confounders (and there always are).

              If you want evidence of this, just look at the umpteen, headline-grabbing, articles from the Harvard nutritional Epi group lead by Walter Willett, all based on analyses of the SAME large cohort studies. The presence of unmeasured confounding is readily identifiable from Table 1 in any of these articles:
              http://www.ncbi.nlm.nih.gov/pubmed/24095894
              http://www.ncbi.nlm.nih.gov/pubmed/23990623
              http://www.ncbi.nlm.nih.gov/pubmed/23828881
              http://www.ncbi.nlm.nih.gov/pubmed/22412075

              And there are so many more beyond this, each with different junior faculty or post docs from Willett’s group as a first author. All most likely noise, but all received media attention.

              • Mark: Can you explain, relatively non-technically, how controlling for known confounders increases or can increase net bias? Thanks.

                • Mark

                  Sure, it’s simple, really. Confounding, like any systematic bias, can be either positive or negative. Thus, controlling for a know confounder in the positive direction in the presence of an unknown confounder in the negative direction increases your net confounding bias. This is really easy to demonstrate via simulations. An overly simplified analogy that I often used is one of an accountant who sometimes makes errors… If your accountant catches all of his positive errors but none of his negative errors, then the accuracy of you final balance is worse off than if he hadn’t caught any errors.

                  • Mark

                    Sorry, what I wrote wasn’t strictly correct… Should have been controlling for a known confounder in the positive direction in the presence of an unknown confounder of equal or greater magnitude in the negative direction…

        • john byrd

          I would throw into the root cause basket another development in recent years. Many journals demand a one to two week turn on peer reviews. For most of us, that is next to impossible given other duties. So, you must choose between giving a “quickie” review , or stopping everything else you are doing to focus on the review, or decline it. I decline if given too short a suspense. How many reviewers take the first course of action? This problem is insidious…

  2. john byrd

    While these problems might be serious and real in some quarters, it seems fair to say it does not serve as a blanket condemnation of all efforts in all disciplines. I think, as said before, the Econ articles are giving a distorted picture to the public. This is not good for anyone, not even Creationists and Postmodernists.

    • John: I’m not concerned so much that science may be misrepresented. The Econ article, and dozens just like it, are–to a large degree–scapegoating frequentist error statistics; they put forward a cockamamie “false positive” computation, and blur frequentist and Bayesian goals. The axes that S. Young, and Kepler have to grind (despite their differences) at least tell us what’s really going wrong.

      • john byrd

        Right. I find that ironic, as the error (frequency) statistics require careful attention to process and testing assumptions, etc. In my view, it is approaches that shrug off those concerns that are encouraging junk science. They seem to get a free pass from the media. A good example is Bayesian statistics where the priors are subjective.

        • John: You and I both. I don’t think it’s an accidental correlation that the problems we keep hearing about, traced to poor experimental design and abusing the stipulations required to obtain legitimate error probabilities, are in a climate where Bayesian statistics purports to work without having to struggle with such things. That said, I’m inclined to think the largest force is economic: the crash, changes in publishing, technology, heightened competition. I remember that when Stan Young had that post about how ignoring randomization has discredited a decade of genomic work, the articles discussing it described statisticians claiming that these guys never consult them about experimental design. But I’m really an outsider here.

          • original_guest

            I appreciate that some people have strong impulses towards taking potshots at Bayesianism, but Bayes has little to do with the problems described; almost all analyses published in health journals are not Bayesian.

            A good Bayesian analysis of a poor experiment, like a good frequentist analysis, is going to conclude that the data tell us essentially nothing that wasn’t known before. This is likely not publishable. Sure, one can do crappy analysis (either Bayes or not) and claim one has made the proverbial silk purse, and this is more likely to get published. But the problem lies with the incentive model for publication – and the capacity to do more poor experiments, instead of fewer better ones.

            To fix this, you need a bit of game theory, and consensus on what the goals should be; e.g. I don’t think it’s automatic that controlling Type I error rate at 5% should be the sine qua non.

            • OG: Well I think that’s the gist of Stan Young’s idea, although it might be more Deming than game theory. My own view is in thinking of incentives–carrot or stick–is that the stick is more effective.

  3. E. Berk

    It might be objected that agriculture doesn’t involve new explorations to the same degree as biology, it’s developed enough to expect a high degree of replication.

  4. Nathan Schachtman

    Mayo,

    As a lawyer involved in cases that are often dominated by contested issues of medical causation, I appreciate Dr Young’s comments on a deeply personal level. Perhaps the science that is marshaled in litigation is particularly suspect, but I can attest that the major clinical journals publish some articles that are so deeply flawed that “they are not even wrong.” Examples upon request. Many of these studies can be seen to be flawed from just reading the papers themselves, but occasionally I have sought out the underlying data. And that is where the games really begin.

    In litigation over welding fume exposure and claims of parkinsonism, I subpoenaed underlying data from a researcher who had been working closely with plaintiffs’ counsel. He contested the subpoena, but I prevailed for the most part. Unfortunately, however, the trial court agreed to the researcher’s form of protective order that essentially prevented me, and my expert witnesses, from publicly disclosing the obtained data and materials (questionnaires, and the like). We are permitted only to use the obtained data in court, and the plaintiffs’ counsel wisely stopped relying upon this article. (The cases are now almost entirely resolved or dismissed.) The protective order went further and stipulated that any of my expert witnesses who looked at the protected materials could not serve on peer review or grant review committees for this researcher. In other words, the researcher in question, protected himself from scrutiny in the scientific community, and protected his future grants (which he did obtain), by blocking participation by anyone who had seen the underlying data from the study that was used to make the case for federal funding of related research.

    With respect to the EPA study, or the EPA-funded study, Dr Young mentions, the Freedom of Information Act may provide the needed tool to liberate the underlying data. Again, I can say that I have obtained, using the Act, protocols, interim reports, and final reports that so diverged from later published studies that I could have no confidence in the studies, the integrity of the researchers, or the ability of the National Institutes of Health to police the process. I don’t wish to impugn the integrity of researchers generally. I have not done any systematic review or adequate sampling of grant applications and subsequent publications. I can say, however, that when researchers game the system, whether intentionally, negligently, or otherwise, the system seems easy enough to fool.

  5. Nathan: It’s great to have this meeting ground between the astute statistical and legal perspectives represented here.

    One thing: Did the researcher, whose lawyer prohibited x,y,z from serving on his/her future peer review or grant review committees win the case? (you say resolved or dismissed). In any event, I suppose this isn’t much different from being allowed to list people you don’t want to have peer review your grant (e.g., by NSF). I doubt the list can be very long.

  6. Nathan Schachtman

    Mayo,

    There were about 24 cases tried in this litigation. The defense won about 21 with the jury. Of the 3 or so that were lost, one was reversed on appeal, one judgment for plaintiff was affirmed, and one judgment was still on appeal when settled. A couple of years ago, the plaintiffs started to push cases that were unequivocal diagnoses of Parkinson’s disease, where the epidemiology is particularly strong for the defense. Mortimer, Borenstein, and Nelson published a meta-analysis that showed a summary point estimate below 1.0, p < 0.001. At this point, the remaining cases settled on confidential terms, but I will suggest were extremely favorable terms for the defense. This was a litigation bankrollled by lawyer Dickie Scruggs, who is now resident in a federal penitentiary.

    The study involved in my subpoena request was used by plaintiffs' lawyers and their expert witnesses. The researcher was not involved, and I cannot say he won or lost with respect to the litigation. I will say he won a large, multimillion dollar grant from NIEHS to study parkinsonism in welders.

    As for suggesting peer reviewers (and also suggesting the exclusion of specific persons), I realize that the practice is common. Manuscript authors "suggest"; they cannot control the choice of peer reviewers. I hope, perhaps naively, that editors would overrule exclusions of thought leaders when the objections were not based upon solid grounds. The difference of course is that in my case, the court ordered the recusal of anyone I had shown the data to, on grounds simply that they had seen the "dirty underwear."

    In my view, the point is not that the list of recused scientists is long, but that it exists at all, and that a scientist can condition sharing data on court-imposed orders of silence. Yes; I achieved a litigation objective of removing a study from the litigation arsenal of plaintiffs' counsel, but the study in question is still uncritically cited in review articles, regulatory initiatives, and in grant proposals to build upon the earlier, dubious research.

    NAS

Leave a reply to original_guest Cancel reply

Blog at WordPress.com.