Statistical Challenges in Assessing and Fostering the Reproducibility of Scientific Results
I generally find National Academy of Science (NAS) manifestos highly informative. I only gave a quick reading to around 3/4 of this one. I thank Hilda Bastian for twittering the link. Before giving my impressions, I’m interested to hear what readers think, whenever you get around to having a look. Here’s from the intro*:
Questions about the reproducibility of scientific research have been raised in numerous settings and have gained visibility through several high-profile journal and popular press articles. Quantitative issues contributing to reproducibility challenges have been considered (including improper data management and analysis, inadequate statistical expertise, and incomplete data, among others), but there is no clear consensus on how best to approach or to minimize these problems…
A lack of reproducibility of scientific results has created some distrust in scientific findings among the general public, scientists, funding agencies, and industries. For example, the pharmaceutical and biotechnology industries depend on the validity of published findings from academic investigators prior to initiating programs to develop new diagnostic and therapeutic agents that benefit cancer patients. But that validity has come into question recently as investigators from companies have noted poor reproducibility of published results from academic laboratories, which limits the ability to transfer findings from the laboratory to the clinic (Mobley et al., 2013).
While studies fail for a variety of reasons, many factors contribute to the lack of perfect reproducibility, including insufficient training in experimental design, misaligned incentives for publication and the implications for university tenure, intentional manipulation, poor data management and analysis, and inadequate instances of statistical inference. The workshop summarized in this report was designed not to address the social and experimental challenges but instead to focus on the latter issues of improper data management and analysis, inadequate statistical expertise, incomplete data, and difficulties applying sound statistical inference to the available data.
As part of its core support of the Committee on Applied and Theoretical Statistics (CATS), the National Science Foundation (NSF) Division of Mathematical Sciences requested that CATS hold a workshop on a topic of particular importance to the mathematical and statistical community. CATS selected the topic of statistical challenges in assessing and fostering the reproducibility of scientific results.
On February 26-27, 2015, the National Academies of Sciences, Engineering, and Medicine convened a workshop of experts from diverse communities to examine this topic. Many efforts have emerged over recent years to draw attention to and improve reproducibility of scientific work. This workshop uniquely focused on the statistical perspective of three issues: the extent of reproducibility, the causes of reproducibility failures, and the potential remedies for these failures. …
The workshop, sponsored by NSF, was held at the National Academy of Sciences building in Washington, D.C. Approximately 75 people, including speakers, members of the planning committee and CATS, invited guests, and members of the public, participated in the 2-day workshop. The workshop was also webcast live to nearly 300 online participants. This report has been prepared by the workshop rapporteur as a factual summary of what occurred at the workshop. The planning committee’s role was limited to organizing and convening the workshop. The views contained in the report are those of individual workshop participants and do not necessarily represent the views of all workshop participants, the planning committee, or the National Academies of Sciences, Engineering, and Medicine.
In addition to the summary provided here, materials related to the workshop can be found online at the website of the Board on Mathematical Sciences and Their Applications (http://www.nas.edu/bmsa), including the agenda, speaker presentations, archived webcasts of the presentations and discussions, and other background materials.
You can read the full report here.
Share your thoughts.
*By the way, my favorite quote from Fisher (vs isolated significant results) was stated in full by two of the speakers (let me know if you find more than 2).
First, I’d like it if Stephen Senn would look at how one of the contributors computes replication probabilities on p.49, and lets us know what he thinks.
I noticed a reference to p.50 Young and Karr (2011)–Stan Young being a contributor to this blog.
p. 77 There are many instances in which
analyses are done prematurely against the advice of statisticians, and researchers
shift outcomes or fail to define outcomes adequately at the onset so as to look for
an outcome that produces a significant result. The participant noted that it is hard
to resist the pressure because statisticians usually work for the investigator and an
investigator can look for other statisticians whose recommended adjustments are
This reminds me of the ratings agencies in the movie the Big Fail–each was under pressure to puff up the ratings of junk, else junk owners would just go to other, more lenient ratings agencies.
p.79 Baggerly gives the best idea I’ve seen so far in this report: The Popperian requirement that researchers stipulate in advance
“what results would
indicate that the treatment resulted in no significant differences”.
Compare the topics/treatments of this workshop with my recent post on Goldacre’s complaints: people/journals saying there’s nothing wrong with what they’re doing.
>”The Popperian requirement that researchers stipulate in advance “what results would indicate that the treatment resulted in no significant differences”.”
The Popperian requirement would be stipulate in advance what results would indicate the evidence is inconsistent with the researcher’s theory. This is != “significant differences”, which are usually the opposite of what is predicted by the theory. That is also what the researchers and audience actually care about.