We constantly hear that procedures of inference are inescapably subjective because of the latitude of human judgment as it bears on the collection, modeling, and interpretation of data. But this is seriously equivocal: Being the product of a human subject is hardly the same as being subjective, at least not in the sense we are speaking of—that is, as a threat to objective knowledge. Are all these arguments about the allegedly inevitable subjectivity of statistical methodology rooted in equivocations? I argue that they are!
Insofar as humans conduct science and draw inferences, it is obvious that human judgments and human measurements are involved. True enough, but too trivial an observation to help us distinguish among the different ways judgments should enter, and how, nevertheless, to avoid introducing bias and unwarranted inferences. The issue is not that a human is doing the measuring, but whether we can reliably use the thing being measured to find out about the world.
Remember the dirty-hands argument? In the early days of this blog (e.g., October 13, 16), I deliberately took up this argument as it arises in evidence-based policy because it offered a certain clarity that I knew we would need to come back to in considering general “arguments from discretion”. To abbreviate:
- Numerous human judgments go into specifying experiments, tests, and models.
- Because there is latitude and discretion in these specifications, they are “subjective.”
- Whether data are taken as evidence for a statistical hypothesis or model depends on these subjective methodological choices.
- Therefore, statistical inference and modeling is invariably subjective, if only in part.
We can spot the fallacy in the argument much as we did in the dirty hands argument about evidence-based policy. It is true, for example, that by employing a very insensitive test for detecting a positive discrepancy d’ from a 0 null, that the test has low probability of finding statistical significance even if a discrepancy as large as d’ exists. But that doesn’t prevent us from determining, objectively, that an insignificant difference from that test fails to warrant inferring evidence of a discrepancy less than d’.
Test specifications may well be a matter of personal interest and bias, but, given the choices made, whether or not an inference is warranted is not a matter of personal interest and desire. Setting up a test with low power against d’ might be a product of your desire not to find an effect for economic reasons, of insufficient funds to collect a larger sample, or of the inadvertent choice of a bureaucrat. Or ethical concerns may have entered. But none of this precludes our critical evaluation of what the resulting data do and do not indicate (about the question of interest). The critical task need not itself be a matter of economics, ethics, or what have you. Critical scrutiny of evidence reflects an interest all right—an interest in not being misled, an interest in finding out what the case is, and others of an epistemic nature.
Objectivity in statistical inference, and in science more generally, is a matter of being able to critically evaluate the warrant of any claim. This, in turn, is a matter of evaluating the extent to which we have avoided or controlled those specific flaws that could render the claim incorrect. If the inferential account cannot discern any flaws, performs the task poorly, or denies there can ever be errors, then it fails as an objective method of obtaining knowledge.
Consider a parallel with the problem of objectively interpreting observations: observations are always relative to the particular instrument or observation scheme employed. But we are often aware not only of the fact that observation schemes influence what we observe but also of how they influence observations and how much noise they are likely to produce so as to subtract them out. Hence, objective learning from observation is not a matter of getting free of arbitrary choices of instrument, but a matter of critically evaluating the extent of their influence to get at the underlying phenomenon.
For a similar analogy, the fact that my weight shows up as k pounds reflects the convention (in the United States) of using the pound as a unit of measurement on a particular type of scale. But given the convention of using this scale, whether or not my weight shows up as k pounds is a matter of how much I weigh!*
Likewise, the result of a statistical test is only partly determined by the specification of the tests (e.g., when a result counts as statistically significant); it is also determined by the underlying scientific phenomenon, at least as modeled. What enables objective learning to take place is the possibility of devising means for recognizing and effectively “subtracting out” the influence of test specifications, in order to learn about the underlying phenomenon, as modeled.
Focusing just on statistical inference, we can distinguish between an objective statistical inference, and an objective statistical method of inference. A specific statistical inference is objectively warranted, if it has passed a severe test; a statistical method is objective by being able to evaluate and control (at least approximately) the error probabilities needed for a severity appraisal. This also requires the method to communicate the information needed to conduct the error statistical evaluation (or report it as problematic).
It should be kept in mind that we are after the dual aims of severity and informativeness. Merely stating tautologies is to state objectively true claims, but they are not informative. But, it is vital to have a notion of objectivity, and we should stop feeling that we have to say, well there are objective and subjective elements in all methods; we cannot avoid dirty hands in discretionary choices of specification, so all inference methods do about as well when it comes to the criteria of objectivity. They do not.
*Which, in turn, is a matter of my having overeaten in London.
To the above discussion, let me add an “apparent” crucial difference between the Bayesian and frequentist perspectives as it relates to the specification of the likelihood function L(θ;z₀) and the associated statistical model Mθ(z).
According to Kadane (2011):
“… likelihoods are just as subjective as priors, and there is no reason to expect scientists to agree on them in the context of an applied problem.” (p. 445)
From the frequentist perspective likelihoods are defined by the probabilistic assumptions comprising the statistical model Mθ(z) in question, like - for the Linear Regression model in table 1; see Intro to Misspecification Testing: Ordering From A Full Diagnostic Menu (part 1), Posted on February 22, 2012 by Mayo.
In light of that, there is nothing subjective or arbitrary about the choice of a statistical model Mθ(z) or the associated likelihood function, and its choice is not based on any agreement amongst scientists. The validity of these assumptions is independently testable vis-a-vis data z₀, using thorough Mis-Specification (M-S) testing.
Kadane, J. B. (2011), Principles of Uncertainty, Chapman & Hall, NY.
ARis: Thanks for the Kadane reference, I was thinking specifically of Box and, at times at least (e.g., his paper in the RMM volume), Gelman. But I was also trying to point out something that is overlooked, and perhaps it is hard to explain clearly.
In an attempt to do so, I wrote “The issue is not that a human is doing the measuring, but whether we can reliably use the thing being measured to find out about the world.”
So let us imagine there was a perfect way to measure a person’s real and true degrees of belief in a hypothesis (maybe with some neuropsych development), while with frequentist statistical models, we grope our way and at most obtain statistically adequate representations of aspects of the data generating mechanism producing the relevant phenomenon. In the former, the measurement is 100% reliable, but the question that remains is the relevance of the thing being measured for finding out about the world. People seem utterly to overlook this.
I hope you are writing your comments in sight of the Parthenon or the like. Mayo
I like this entry a lot although as you know I have4 my doubts about the concept of objectivity. I think that you explained your view here clearly and this is good food for thought.
Now let’s say we are in a situation in which the data through M-S tests will neither rule out a Normal nor a t_5-distribution (more precisely, a location-scale model based on a t_5), say, because the amount of data available just doesn’t allow to distinguish between these two (as it often doesn’t).
Would you accept to say that it is a subjective choice of whether further analyses are based on a Normal or on a t_5 distribution (despite the fact that after choosing one of them what follows may be called “objective”)? If not, please explain.
Of course in many cases inferences based on a Normal and a t_5 distribution will yield the same results in terms of interpretation, but one can always find distributions (albeit often quite messy ones) for which this is not the case and which still are not in detectable disagreement with the data,