I claim that all but the first of the “dirty hands” argument’s five premises are flawed. Even the first premise too directly identifies a policy decision with a statistical report. But the key flaws begin with premise 2. Although risk policies may be based on a statistical report of evidence, it does not follow that the considerations suitable for judging risk policies are the ones suitable for judging the statistical report. They are not. The latter, of course, should not be reduced to some kind of unthinking accept/reject report. If responsible, it must clearly and completely report the nature and extent of (risk-related) effects that are and are not indicated by the data, making plain how the methodological choices made in the generation, modeling, and interpreting of data raise or lower the chances of finding evidence of specific risks. These choices may be called risk assessment policy (RAP) choices.
Granted, values do arise from data interpretation, but they reflect the value of responsibly reporting the evidence of risk. Some ethicists argue that scientists should favor public and environmental values over those of polluters, developers, and others with power. Maybe they should, but it is irrelevant. Even if one were to grant this (and it would be a matter of ethics), it still would be irresponsible (on scientific grounds) to interpret what the data indicate about the risk in the light of policy advancement, even assuming that the vulnerable parties would prefer that policy. The job of the scientist is to unearth what is and is not known about the substance, practice, or technology.
The critics are right that in issuing “clean bills of health” there is a concern that the probability of a type II error may be too high. But the solution is not to try and minimize it. Rather we should use this information to argue:
If the Prob(test T accepts H0; increased risk d is present) is very high, then accepting H0 with test T is poor evidence that an increased risk d is absent.
Although H0 “passed” test T, the test it passed was not severe—it is very probable that H0 would pass this test, even if the increased risk is actually as large as d. Therefore, a failure to reject H0 with test T does not license inferring that the increased risk is less than d. We could also use the negative result in order to find a value for the increased risk—call it d*—such that so negative a result is very improbable if the increased risk were as high as d*. Then the negative result allows inferring that d < d*. (We are back to “rule M” from the formaldehyde hearings.) Even getting this right, however, only takes one to the level of the statistical report and not to subsequent risk policy decisions.
NOTE: This is not the same as what some are calling “observed power”—Oy! I’ll have to come back to this later, I am seeing this curious animal on some blogs on the top 50 list!
Contrary to promoting the public good, taken seriously, the ethics in evidence argument would be tantamount to saying that the evidence does not matter much—what matters, for an ethical interpretation of data, is the preferred policy consequences, the preference being decided on one or another ethical grounds.
I do not think proponents of the ethics in evidence position would wish to accept this logical consequence of their position. Not only does it have the untoward consequence of discounting or diminishing the role of evidence, it should be keep in mind that ethical grounds can shift and be used for conflicting ends (e.g., preventing starvation, and avoiding risks of GM foods). Most importantly, a position that can imply that evidence does not matter much is going to be (and has been) regarded as anti-science—greatly diminishing the voice of those who rightly wish to press for more responsible science. If it is all or largely a matter of political and social values, then more and better evidence can not help. What better excuse for those happy not having to provide better evidence!
I agree with pretty much everything up to now in the “Objectivity” postings but it would be helpful at least for me if you could define what you mean by “objectivity”. I can imagine various meanings of this term some of which to me seem justified here whereas some others don’t. The trouble is that those that work (for example something like “transparency”/”reproducability” is meant) have much more modest implications than what people seem to expect from “objectivity” in the public discussion.
Should be “(for example *if* something like “transparency”/”reproducability” is meant)”
Is the following translation of rule M into math correct?
Definitions:
p_obs == the observed p-value,alpha == the chosen type I error rate, p == the p-value considered as a random variable, Pr(A | B; param_value) == the probability of the event A conditional on event B under the alternative hypothesis with parameter equal to param_value,p_small == a number between 0 and 1 whose value quantifies the phrase “very improbable”.
Rule M (perhaps):
If p_obs > alpha, infer “delta p_obs | p_obs > alpha; delta_star) < p_0}."
It is hard to read, it would be safest to consult calculations as in Mayo and Spanos “Error Statistics” or the formaldehyde paper cited in an earlier post, or ch. 11 EGEK or FEV(ii), on p. 256, (Mayo and Cox 2010). (For help, contact whaler: jemille6@vt.edu). However, I should note that this is not a conditional probability: it is a probability (of an event, e.g., reaching a given p-value) CALCULATED UNDER THE ASSUMPTION of some parameter value or other. One may do the computation for several parameter values, just like power.
I’m reading ch 11 of EGEK now, and I would like to discuss a passage with you that is not related to the topic of this post. What is your preferred venue for such a discussion: here, by email, some other means? Or perhaps you would rather defer the discussion altogether…
Ah, I see. I interpreted “The only difference is that M takes account of the actual insignificant p-value, and so is more informative” to mean that probabilities were to be calculated conditional on the event that the p-value is insignificant. (This should still be valid in the error-statistical paradigm, since rule M applies if and only if this event occurs.) I tried to write the expression in a way that made clear that I wasn’t conditioning on the parameter value; stuff after the semi-colon specifies the distributional assumptions for the stuff before it.I’m reading the formaldehyde paper now.
OK but please recall I said that paper was deliberately informal, and given your queries, I’m guessing you won’t find it as apt as the other papers. It really is not conditional on the event of significance, at least as I think most people understand this. It refers to just an ordinary sampling distribution, in this case of the P-value as a statistic. It is the reasoning that has to serve as guide for the computations. There’s nothing exotic about it.. I plan to write tomorrow a.m. on (what I take to be) the objectivity lesson.
Gah. My line breaks disappeared and now my definitions are unreadable. Let’s try again:
p_obs == the observed p-value,
alpha == the chosen type I error rate,
p == the p-value considered as a random variable,
Pr(A | B; param_value) == the probability of the event A conditional on event B under the alternative hypothesis with parameter equal to param_value,
p_small == a number between 0 and 1 whose value quantifies the phrase “very improbable”.
And the formula has errors in it too…
Rule M (perhaps, second attempt):
If p_obs > alpha, infer “delta p_obs | p > alpha; delta_star) < p_small}."
Pingback: Objectivity (#4) and the “Argument From Discretion” « Error Statistics Philosophy