Scientific Misconduct and Scientific Expertise
1st Barcelona HPS workshop
November 11, 2016
Departament de Filosofia & Centre d’Història de la Ciència (CEHIC), Universitat Autònoma de Barcelona (UAB)
Location: CEHIC, Mòdul de Recerca C, Seminari L3-05, c/ de Can Magrans s/n, Campus de la UAB, 08193 Bellaterra (Barcelona)
Organized by Thomas Sturm & Agustí Nieto-Galan
Current science is full of uncertainties and risks that weaken the authority of experts. Moreover, sometimes scientists themselves act in ways that weaken their standing: they manipulate data, exaggerate research results, do not give credit where it is due, violate the norms for the acquisition of academic titles, or are unduly influenced by commercial and political interests. Such actions, of which there are numerous examples in past and present times, are widely conceived of as violating standards of good scientific practice. At the same time, while codes of scientific conduct have been developed in different fields, institutions, and countries, there is no universally agreed canon of them, nor is it clear that there should be one. The workshop aims to bring together historians and philosophers of science in order to discuss questions such as the following: What exactly is scientific misconduct? Under which circumstances are researchers more or less liable to misconduct? How far do cases of misconduct undermine scientific authority? How have standards or mechanisms to avoid misconduct, and to regain scientific authority, been developed? How should they be developed?
All welcome – but since space is limited, please register in advance. Write to: Thomas.Sturm@uab.cat
09:30 Welcome (Thomas Sturm & Agustí Nieto-Galan)
9:45 José Ramón Bertomeu-Sánchez (IHMC, Universitat de València): Managing Uncertainty in the Academy and the Courtroom: Normal Arsenic and Nineteenth-Century Toxicology
10:30 Carl Hoefer (ICREA & Philosophy, University of Barcelona): Comments on Bertomeu-Sánchez
10:45 Discussion (Chair: Agustí Nieto-Galan)
11:30 Coffee break
12:00 David Teira (UNED, Madrid): Does Replication help with Experimental Biases in Clinical Trials?
12:45 Javier Moscoso (CSIC, Madrid): Comment on Teira
13:00 Discussion (Chair: Thomas Sturm)
13:45-15:00 Lunch
15:00 Torsten Wilholt (Philosophy, Leibniz University Hannover): Bias, Fraud and Interests in Science
15:45 Oliver Hochadel (IMF, CSIC, Barcelona): Comments on Wilholt
16:00 Discussion (Chair: Silvia de Bianchi)
16:45-17:15: Agustí Nieto-Galan & Thomas Sturm: Concluding reflections
ABSTRACTS
José Ramón Bertomeu-Sánchez: Managing Uncertainty in the Academy and the Courtroom: Normal Arsenic and Nineteenth-Century Toxicology
This paper explores how the enhanced sensitivity of chemical tests sometimes produced unforeseen and puzzling problems in nineteenth-century toxicology. It focuses on the earliest uses of the Marsh test for arsenic and the controversy surrounding “normal arsenic”, i.e., the existence of traces of arsenic in healthy human bodies. The paper follows the circulation of the Marsh test in French toxicology and its appearance in the academy, the laboratory and the courtroom. The new chemical tests could detect very small quantities of poison, but their high sensitivity also offered new opportunities for imaginative defense attorneys to undermine the credibility of expert witnesses. In this context, toxicologists had to dispel the uncertainty associated with the new method, and to find arguments to refute the many possible criticisms (of which “normal arsenic” was one). Meanwhile, new descriptions of animal experiments, autopsies and cases of poisoning produced a steady flow of empirical data, sometimes supporting but, in many cases, questioning previous conclusions about the reliability of chemical tests. This particularly challenging scenario provides many clues about the complex interaction between science and law in the nineteenth century, particularly on how expert authority, credibility and trustworthiness were constructed, and frequently challenged, in the courtroom.
David Teira: Does Replication help with Experimental Biases in Clinical Trials?
This is an analysis of the role of replicability in correcting biases in the design and conduct of clinical trials. We take as biases those confounding factors that a community of experimenters acknowledges and for which there are agreed debiasing methods. When these methods are implemented in a trial, we will speak of unintended biases, if they occur. Replication helps in detecting and correcting them. Intended biases occur when the relevant debiasing method is not implemented. Their effect may be stable and replication, on its own, will not detect them. Interested outcomes are treatment variables that not every stakeholder considers clinically relevant. Again, they may be perfectly replicable. Intended biases, unintended biases and interested outcomes are often conflated in the so-called replicability crisis: our analysis shows that fostering replicability, on its own, will not sort out the crisis.
Torsten Wilholt: Bias, Fraud and Interests in Science
Cases of fraud and misconduct are the most extreme manifestations of the adverse effects that conflicts of interests can have on science. Fabrication of data and falsification of results may sometimes be difficult to detect, but they are easy to describe as epistemological failures. But arguably, detrimental effects of researchers’ interests can also take more subtle forms. There are numerous ways by which researchers can influence the balance between the sensitivity and the specificity of their investigation. Is it possible to mark out some such trade-offs as cases of detrimental bias? I shall argue that it is, and that the key to understanding bias in science lies in relating it to the phenomenon of epistemic trust. Like fraud, bias exerts its negative epistemic effects by undermining the trust amongst scientists as well as the trust invested in science by the public. I will point out how this analysis can help us to draw the fine lines that separate unexceptionable from biased research and the latter from actual fraud.
I’m very glad to see philosophers getting in on this.
Researchers are happy to report a p-value less than 0.05 and take credit from the long history of statistical technology that is proven to work when properly applied, e.g. agriculture, industry, etc. Many/most researchers know about multiple testing and multiple modeling and that a single p-value taken from a vast sea of p-values that come from mucking through the data is not reliable. They know what they are doing.
It is long past the time that serious scientists and statisticians call them out both to the editor in the journal where the result appears and the funding agency that paid for the work.
Editors should withdraw papers that are prima facie unreliable due to multiple testing and/or multiple modeling.
Funding agencies have been AWOL.
There’s also a potential legal liability issue lurking in the background, especially for predatory publishers: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2669118
Stan: I agree.
Good morning.
I have a lot of doubts regarding multiple testing procedures.
Although most researchers know about multiple testing, I find that handling it is unclear.
As far as I know, the controversy on how to work through multiple testing has not been settled.
So who’s going to be the judge when allegations of scientific misconduct or fraud are raised? Is p-hacking that obvious to spot?
I have read some serious researchers (like Rothman) write that it is unnecesary to adjust, others have gone Bayesian because they cannot find a convincing solution to this problem.
So is it about researchers or is it about coming to terms with the method?
I’m not condoning p-hacking, but I think that blaming it all on researchers, and even accusing researchers with fraud and scientific misconduct, when most statistics books (I’ve read) don’t mention how to manage multiple comparisons clearly and the controversy is not settled (as far as I know), is, as what we would say in my country, “buscar el muerto río arriba” (searching for the body upstream).
Regards.
Martin: I’m sure Stan Young can answer yur specific questions on multiple testing adjustments. You might be interested to know that such adjustments are required in the best (or even required) practices by evidence in the law rulebooks, the FDA.
The myth that Bayesians methods are free of multiple testing and other selection effects and optional stopping, is just that–a myth. You will here: we can prove our methods aren’t altered, which means only that if you obey the likelihood principle, indeed, the harms to error probabilities won’t show up. If you condition on the data, in other words, outcomes other than the one observed won’t register. That doesn’t mean we (error statisticians) can’t show that the resulting posterior, Bayes factor or likelihood ratio can be strongly in favor of a given hypothesis even though it’s false. Even Bayesians are uncomfortable with this fact, and look for ways to block biasing selection effects. So when you hear Bayesians claim to hold the magic (as LeCam ut it) to draw inferences from any aspect of the data they happen to notice post data, you should worry it might be “voodoo statistics”.
That doesn’t mean multiple tests always alters error probabilities pejoratively. They need not, for instance, in searching for the cause of a known effect.
Thank you, Dr. Mayo, I’ll look into it.