Objectivity #2: The “Dirty Hands” Argument for Ethics in Evidence

Posted on October 13, 2011 by Mayo

Some argue that generating and interpreting data for purposes of risk assessment invariably introduces ethical (and other value) considerations that might not only go beyond, but might even conflict with, the “accepted canons of objective scientific reporting.” This thesis, we may call it the thesis of ethics in evidence and inference, some think, shows that an ethical interpretation of evidence may warrant violating canons of scientific objectivity, and even that a scientist must choose between norms of morality and objectivity.

The reasoning is that since the scientists’ hands must invariably get “dirty” with policy and other values, they should opt for interpreting evidence in a way that promotes ethically sound values, or maximizes public benefit (in some sense).

I call this the “dirty hands” argument, alluding to a term used by philosopher Carl Cranor (1994).¹

I cannot say how far its proponents would endorse taking the argument.² However, it seems that if this thesis is accepted, it may be possible to regard as “unethical” the objective reporting of scientific uncertainties in evidence. This consequence is worrisome: in fact, it would conflict with the generally accepted imperative for an ethical interpretation of scientific evidence.

Nevertheless, the “dirty hands” argument as advanced has apparently plausible premises, one or more of which would need to be denied to avoid the conclusion which otherwise follows deductively. It goes roughly as follows:

Whether observed data are taken as evidence of a risk depends on a methodological decision as to when to reject the null hypothesis of no risk H₀ (and infer the data are evidence of a risk).
Thus interpreting data to feed into policy decisions with potentially serious risks to the public, the scientist is actually engaged in matters of policy (what is generally framed as an issue of evidence and science, is actually an issue of policy values, ethics, and politics).
The public funds scientific research and the scientist should be responsible for promoting the public good, so scientists should interpret risk evidence so as to maximize public benefit.
Therefore, a responsible (ethical) interpretation of scientific data on risks is one that maximizes public benefit–and one that does not do so is irresponsible or unethical.
Public benefit is maximized by minimizing the chance of failing to find a risk. This leads to the conclusion in 6:
CONCLUSION: In situations of risk assessment the ethical interpreter of evidence will maximize the chance of inferring there is a risk–even if this means inferring a risk when there is none with high probability (or at least a probability much higher than is normally countenanced)

The argument about ethics in evidence is often put in terms of balancing type 1 and 2 errors.

Type I error:test T finds evidence of an increased risk ( H₀ is rejected), when in fact the risk is absent (false positive)

Type II error: test T does not find evidence of an increased risk ( H₀ is accepted), when in fact an increased risk δ is present (false negative).

The traditional balance of type I and type II error probabilities, wherein type I errors are minimized, some argue, is unethical. Rather than minimize type I errors, it might be claimed, an “ethical” tester should minimize type II errors.

I claim that at least 3 of the premises, while plausible-sounding, are false. What do you think?
_____________________________________________________

(1) Cranor (to my knowledge) was among the first to articulate the argument in philosophy, in relation to statistical significance tests (it is echoed by more recent philosophers of evidence based policy):

Scientists should adopt more health protective evidentiary standards, even when they are not consistent with the most demanding inferential standards of the field. That is, scientists may be forced to choose between the evidentiary ideals of their fields and the moral value of protecting the public from exposure to toxins, frequently they cannot realize both (Cranor 1994, pp. 169-70).

Kristin Shrader-Frechette has advanced analogous arguments in numerous risk research contexts.

(2) I should note that Cranor is aware that properly scrutinizing statistical tests can advance matters here.

Cranor, C. (1994), “Public Health Research and Uncertainty”, in K. Shrader-Frechette, Ethics of Sciencetific Research. Rowman and Littlefield, pp. 169-186.

Shrader-Frechette, K. (1994), Ethics of Scientific Research, Rowman and Littlefield

Categories: Objectivity, Objectivity, Statistics | Tags: Carl Cranor, dirty hands argument, ethics in evidence, evidence based policy, risk assessment | 17 Comments

17 thoughts on “Objectivity #2: The “Dirty Hands” Argument for Ethics in Evidence”

October 13, 2011

Andrew Gelman

I really hate the Type 1, Type 2 error framework here. In many important cases, the question is not whether a risk is zero but rather how large the risk is. Or, more generally, the magnitudes of different effects.

Reply

October 13, 2011

mayoerrorstat

True, and these discussions exacerbate the problem because there is a blur between the formal statistical type 1 and 2 errors, and some more informal report subsequent to the data analysis. I deliberately wanted to bring this out, since people should be on the lookout for this loose use statistical errors. Of course, if one were to minimize the type 2 error probability in the case of a “no risk” null, one would always reject the null.

I don’t think a lot of people realize, by the way, that Neyman explicitly intended the type 1 error to be the first in importance and that in cases of risk he advocated the null hypothesis assert the existence of risks.

Reply

October 13, 2011

Michael Grosskopf

It seems like it should be the responsibility of the policy makers to decide how to interpret the risk to maximize the benefit to society. Just minimizing the chance of failing to find a risk is not necessarily the way to maximize public benefit, as there is some trade off.

The scientist’s responsibility is to clearly, honestly communicate the magnitude and probability of the risks given the available data and models. The policy maker is then responsible for making the decision of how to maximize the benefit to the public given the data presented by the scientist. The scientist is only unethical if they misrepresent the data, if they do not give due diligence to the data and miss something, or if they claim greater certainty in their results or expertise than is legitimate.

Thoughts?

Reply

October 13, 2011

mayoerrorstat

Yes, I agree that the scientist’s role is to report the extent of risks indicated by the data, and describe gaps and limits to what is known. The idea of identifying the scientific report with some policy decision is wrongheaded. Also wrongheaded, and dangerous, in my view, is suggesting that policy benefits/losses etc. be intertwined with the interpretation of evidence. Z & M make fun of my position because it doesn’t introduce losses and costs “all the way down” even at the stage of just interpreting the data! But their view only provides a justification for the companies who want to intermingle their losses, were a risk to be found, into the assessment of whether there is evidence of risks and how large.

I say that all premises of the dirty hands are false, save the first.

Reply

October 13, 2011

Guest

Knowing the benefits/losses of possible decisions tells you what aspects of the data to put in your report.

If the set of possible decisions is limited, the resulting report can be small and simple – in the very simplest settings something like a point estimate and standard error might do. But as that set of possible decisions grows (including decisions that take into account results of model-checks) the report that we need grows rapidly.

Particularly in complex settings – where decisions involve the values of high- or infinite-dimensional parameters – there is no tractable summary of the data that tells policy-makers what to do, for every one of a large number of benefit/loss tradeoffs. We must therefore limit the tradeoffs our report addresses. One way is to go back to the earlier approach, and address only a small number of `default’ tradeoffs. Another is to work directly with the policy-makers, addressing the tradeoff they care about – “all the way down”, if you like.

Reply

October 13, 2011

mayoerrorstat

Whose losses do we use? what ethical theory applies? Should we value alleviation of starvation more than the risks of some GM crops? economic losses vs risks to vulnerable groups? Introducing losses at the stage of interpreting the data is a recipe for diminished role of science, or for science in support of the policy-makers with the most clout.

Reply

October 13, 2011

Guest

It is simply not possible to address every loss in a report that reduces the data in some way, because finite-dimensional sufficient statistics do not exist outside of a specific class of models, and there’s no reason to believe that class captures the truth in all cases.

If our report doesn’t reduce the data, we might as well just provide the data, and let others do the inference. But to do that, they’ll have to introduce losses “all the way down”, and we’re back to where we came in.

Or, we pick some losses, based on judgement about which ones we think are plausible. Either way, judgement comes into it somewhere.

Reply

October 13, 2011

Drew Tyre

I tend to agree with AG, now, but in the past I’ve explicitly considered trading off the two errors. The relative cost of each type of error is not the same, and depending on what that cost is may change what value of p maximizes public benefit.

That’s interesting about Neyman – I’ve recently come across applications of that idea as “Equivalence testing”, where the null is that two samples differ by at least X.
Like Reply

Reply

October 13, 2011

mayoerrorstat

But the problem is that the report on the evidence—the statistical inference proper– should be distinct from subsequent policy recommendations. Collapsing the two is a mistake. In any event, the inferences need also to report the capabilities of the tests to detect or overlook risks of various degrees and types (at least that is what I recommend).
Then others can critique them or decide on policy.

Reply

October 13, 2011

Corey

I’d claim that all of the premises are false. Premises 1 and 2 confuse the “precision guesswork” aspect of (statistical) evidential assessment with the entirely separate question of what policy to adopt given the information at our disposal. It is only the latter that is dependent on ethics and values. Given the understanding that these are separate concerns, premise 3 misstates scientists’ responsibility to the public (which is to provide the best information possible about the true state of the world) and premise 4 is a category error. Premise 5 seems especially boneheaded, since it could only be true if for all potential risks, there is no significant costs to acting as if a risk is practically significant even if it isn’t.

Reply

October 13, 2011

mayoerrorstat

Corey: I agree, and I like your way of putting the explanations. I was prepared to allow the truth of premise #1, despite its being a bit guilty of too directly identifying the statistical report with a policy action.

Reply

October 13, 2011

Corey

I think premise 1 falls afoul of what in error-statistical terms is probably best phrased as applying a Neyman-Pearson-style “optimal course of action” perspective on what ought to be viewed as a Fisherian “logic of inductive inference” problem. But I come to that position by starting from my own Bayesian point of view in which inference and decision are naturally sharply separated — the posterior distribution summarizes all available information and minimizing posterior expected loss gives the optimal policy.

Reply

October 13, 2011

mayoerrorstat

?????
No comprendo

Reply

October 13, 2011

Corey

Neyman-Pearson says choose an alpha (the methodological decision of premise 1) and act accordingly. Fisher just says report the p-value and does not, as far as I know, build a mathematical theory about the implications of the choice of threshold. The former mingles inference and subsequent behavior; the latter aims only to provide a summary of the evidence.

Reply

October 13, 2011

Drjohnbyrd

I find all of the premises, save for the first, to be wrongheaded. Most scientific disciplines have professional organizations with ethics guidelines for practitioners. I am not aware of any that do not demand rigorous treatment of data and honest reporting as the standard. Applied labs that are accredited typically must establish similar guidelines for ethical behavior. All that said, I have heard of these arguments, and I presume theyt can be traced ultimately back to some political agenda somewhere. To me, they are the opposite side of the same coin as the well-publicized attempts during the previous President’s terms to filter scientific findings (from federal scientists) for political purposes. Or, like the current complaints against some large forensic labs in this country that results were filtered to support the District Attorney’s case. I do not want to get political, just point out that this argument about “ethics” is possibly more insidious than it appears. Most scientific work– pure research or applied work– is funded by the public. The funding comes with the expectation that scientists can be counted on to do their best to provide accurate, relevant findings. There is an implicit belief by those who decide to fund research or operations, that their trust will not be misplaced. The day we decide to become paternal– and possibly dishonest– then we will breach that trust. Good intentions are no excuse.

Policy makers rarely understand the science. If there is any place that we could really make the world better in this realm, it is by communicating more effectively. That is, providing accurate findings and honest assessments that they can really understand. Testimony to committees that does not make their eyes glaze over and their attention to drift. This is hard to do, but one place where we can help in an appropriate, ethical manner.

Reply
Pingback: Objectivity (#4) and the “Argument From Discretion” « Error Statistics Philosophy
Pingback: Excerpt from Excursion 4 Tour I: The Myth of “The Myth of Objectivity” (Mayo 2018, CUP) | Error Statistics Philosophy

I welcome constructive comments that are of relevance to the post and the discussion, and discourage detours into irrelevant topics, however interesting, or unconstructive declarations that "you (or they) are just all wrong". If you want to correct or remove a comment, send me an e-mail. If readers have already replied to the comment, you may be asked to replace it to retain comprehension. Cancel reply

Objectivity #2: The “Dirty Hands” Argument for Ethics in Evidence

Post navigation

17 thoughts on “Objectivity #2: The “Dirty Hands” Argument for Ethics in Evidence”

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.

Objectivity #2: The “Dirty Hands” Argument for Ethics in Evidence

Related

Post navigation

17 thoughts on “Objectivity #2: The “Dirty Hands” Argument for Ethics in Evidence”

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.