S. Stanley Young: Scientific Integrity and Transparency

Stanley Young recently shared his summary testimony with me, and has agreed to my posting it.

YoungPhoto2008 S. Stanley Young, PhD
Assistant Director for Bioinformatics
National Institute of Statistical Sciences
Research Triangle Park, NC

One-page Summary Young
Testimony of Committee on Science, Space and Technology, 5 March 2013
Scientific Integrity and Transparency
S. Stanley Young, PhD, FASA, FAAAS

Integrity and transparency are two sides of the same coin. Transparency leads to integrity. Transparency means that study protocol, statistical analysis code and data sets used in papers supporting regulation by the EPA should be publicly available as quickly as possible and not just going forward. Some might think that peer review is enough to ensure the validity of claims made in scientific papers. Peer review only says that the work meets the common standards of the discipline and on the face of it, the claims are plausible, Feinstein, Science, 1988. Peer review is not enough.

The current state of science is arguably very poor. For medical observational studies over 80% of initial claims failed to replicate, Ioannidis, JAMA, 2005, Young and Karr, Significance, 2011. Environmental epidemiology studies are likely no better. Recent evidence indicates that experimental studies also fail to replicate close to 90% of the time, Begley and Ellis, Nature, 2012. Scientific fraud is common in retracted science papers, Fang et al., PNAS, 2012. So the evidence is that science claims usually fail to replicate and that fraud is being committed. Again, the current state of science is arguably poor. Promoting transparency is a key solving problems of validity and integrity.

Three things make sense. Going back in time, key regulations and the papers used to support them should be identified. The EPA should secure copies of data used in those papers and make the data public. For example, a number of papers on air pollution and mortality use the ACS CPS II database. Data used in the Harvard Six Cities study is not public. Over 100 key air pollution data sets were identified and a request for data was sent to 50 randomly selected authors. Most of these papers were funded by government grants. These authors provided no data sets, 0 for 50. Without data sets, there is essentially no opportunity for scientific oversight and there is opportunity for fraud.

The EPA should secure a copy of the ACS CPS II, Harvard and other key data sets and make them public. Where data sets are not available, claims in those papers are essentially “trust me” science. The EPA should not be relying on trust me science. The recent OSTP memorandum, Holdren, 22 February 2013, on data access should be supported with legislation: Regulatory agencies can not cite or use papers to support rules or regulations without the underlying data available. Basically, EPA funded authors should follow the guidelines for “reproducible research”.

Going forward the EPA should fund collection and analysis of data separately. One group should have the incentive to provide a high quality data set. The analysis group can then provide a high-quality analysis of the data, knowing that others have an opportunity to do the same, oversight. If data building and analysis are funded together, there is a natural tendency authors not to share the data until the last ounce of information is extracted. Without public access to the data, transparency, initial flawed analysis or even outright fraud can not reliably be detected or corrected. The public, Congress, and the EPA should all want an efficient science process to support sound regulations. It can take years to overturn a claim that is wrong. Making data available by implementing these steps would be big steps toward improving the science process at the EPA and promoting scientific integrity.

Young Graham Laws
Version 3

Use of Science Transparency Act

Any federal agency proposing rule-making or legislation shall specifically name each document used to support the proposed rule-making or legislation and provide all data used in said document for viewing by the public.

Federal Study Transparency Act

If federal funds are provided for a study, all data relating to the reporting of results of said study must be provided for scrutiny by the public at the time of publication.

These are drafts subject to revision and strengthening.

Categories: evidence-based policy, Statistics

Post navigation

10 thoughts on “S. Stanley Young: Scientific Integrity and Transparency

  1. Stanley: thanks for this. I hope you will fill us in a bit here. We’ve been talking FDA and the like, who has time to follow the EPA? (just kidding).

  2. General remark: I heartily endorse the courageous speaking out by Young, Senn and a (smallish) handful of others, and want to support the fraud forensics movement wherever it crops up. However, the basic underlying premise—correct in my judgment—conflicts, or is in tension with, a popular philosophical position that I call “ethics in evidence”.


    In various published papers over the years, I pick apart arguments for the view that scientists ought to bias their construal of data in order to further policies or outcomes considered to promote “the public good”. Yet, it’s a politically correct position that often underlies the obstacles and selective reporting that “the good guys” must face.

  3. Nathan Schachtman

    There has been so much emphasis upon financial bias and conflicts of interest, that many people ignore political enthusiasms and positional conflicts as a potent source of bias. As a lawyer, I encounter papers in scientific journals written by scientists who are clearly advocates. Perhaps to a large extent, papers with no clear conclusion or finding don’t sell, and scientists would rather ride a hobby horse than bland, neutral inconclusive data.

    I have used subpoenas and the Freedom of Information Act to obtain protocols and full underlying datasets. I can attest to the vehemence with which university counsel oppose access to underlying data, and the willful indolence which NIH/NIEHS bureaucrats exhibit in responding to FOIA requests. One subpoena contest led to an author’s publishing an article in Neurology, in which he called me out for serving him with a subpoena (which was largely upheld and enforced, and which resulted in information that kept the study from being used in court, but I can’t tell you why because of a court-ordered gag rule). I wrote a letter to the editor, which got the attention of folks on the Committee for Science, Technology and the Law, at the NAS. After this author and I debated transparency, the Committee backed off making any pronouncements about scientist abuse by lawyers. (The scientist in question is from St Louis, and I told him he should live up to being from the “Show Me” state.)

    Questions for Dr. Young:

    How is your proposal different from the Data Quality Act? How do you see the DQA failing to provide the relief you suggest? Is the full version of your testimony available?


  4. An important aspect of current rules is that you have to ask for the data and the author then can respond or not. Basically there is no enforcement. The current efforts are aimed at making the data available in a repository on publication. Night and day.

  5. Stan, this is really disturbing. Do you have a reference for this claim: “Over 100 key air pollution data sets were identified and a request for data was sent to 50 randomly selected authors….These authors provided no data sets, 0 for 50.”?

    • Stan Young

      Our idea was to request collaboration with authors that had published papers in the area of air pollution and to build a large data set. We wanted to do a meta analysis using the data in the individual papers. As none of the authors agreed to make their data available, the project was abandoned. As I remember we mostly did not get a response from the authors. Two or three responded to say no. One response was quite pointed. If you work for industry, there is no way we will share data with you, or words to that effect. In another aspect of the project we identified 21 journals that publish papers in the area of air pollution. The editors were ask if they required authors to make data sets available. Nineteen had no requirement. Let me add, even if a journal requires that author to promise to make data available, very often they do not. The American Psychological Association has about 50 journals and they require that authors sign a pledge to make data sets used in a publication available. About two thirds of the time authors, for one reason or another, do not provide data sets. In the case of a randomized clinical trial being used to approve a drug, the data is submitted to a trusted 3rd party. In the case of an observational study on air pollution it appears that we simply have to trust that the author reached a valid conclusion.

      Deborah: Feel free to send on to Stephen and/or post my reply. Stan

      • Stan: Your comment is posted, and I’m sure Stephen will see it. (He’s in a different time zone). Thank you so much for the intriguing insights, we have been overlooking the EPA on the blog…

      • Thanks for this helpful and detailed reply Stan. It is highly relevant to the recent work I have been doing* on journal standards in connection with the alltrials petition http://www.alltrials.net/about/ .
        * As a sideline. This is not my day job!

  6. Stan: I hope the “experimental studies” that “also fail to replicate close to 90% of the time” are not experimentally controlled studies.

  7. Stan: I’m guessing objections, scientific and other, have been raised regarding your novel idea of separating the “collection and analysis of data separately”. I hope that you can tell us something about the current status of the proposal at some point.

Blog at WordPress.com.