We constantly hear that procedures of inference are inescapably subjective because of the latitude of human judgment as it bears on the collection, modeling, and interpretation of data. But this is seriously equivocal: Being the product of a human subject is hardly the same as being subjective, at least not in the sense we are speaking of—that is, as a threat to objective knowledge. Are all these arguments about the allegedly inevitable subjectivity of statistical methodology rooted in equivocations? I argue that they are! [This post combines this one and this one, as part of our monthly “3 years ago” memory lane.]
Insofar as humans conduct science and draw inferences, it is obvious that human judgments and human measurements are involved. True enough, but too trivial an observation to help us distinguish among the different ways judgments should enter, and how, nevertheless, to avoid introducing bias and unwarranted inferences. The issue is not that a human is doing the measuring, but whether we can reliably use the thing being measured to find out about the world.
Remember the dirty-hands argument? In the early days of this blog (e.g., October 13, 16), I deliberately took up this argument as it arises in evidence-based policy because it offered a certain clarity that I knew we would need to come back to in considering general “arguments from discretion”. To abbreviate:
- Numerous human judgments go into specifying experiments, tests, and models.
- Because there is latitude and discretion in these specifications, they are “subjective.”
- Whether data are taken as evidence for a statistical hypothesis or model depends on these subjective methodological choices.
- Therefore, statistical inference and modeling is invariably subjective, if only in part.
We can spot the fallacy in the argument much as we did in the dirty hands argument about evidence-based policy. It is true, for example, that by employing a very insensitive test for detecting a positive discrepancy d’ from a 0 null, that the test has low probability of finding statistical significance even if a discrepancy as large as d’ exists. But that doesn’t prevent us from determining, objectively, that an insignificant difference from that test fails to warrant inferring evidence of a discrepancy less than d’.
Test specifications may well be a matter of personal interest and bias, but, given the choices made, whether or not an inference is warranted is not a matter of personal interest and bias. Setting up a test with low power against d’ might be a product of your desire not to find an effect for economic reasons, of insufficient funds to collect a larger sample, or of the inadvertent choice of a bureaucrat. Or ethical concerns may have entered. But none of this precludes our critical evaluation of what the resulting data do and do not indicate (about the question of interest). The critical task need not itself be a matter of economics, ethics, or what have you. Critical scrutiny of evidence reflects an interest all right—an interest in not being misled, an interest in finding out what the case is, and others of an epistemic nature.
Objectivity in statistical inference, and in science more generally, is a matter of being able to critically evaluate the warrant of any claim. This, in turn, is a matter of evaluating the extent to which we have avoided or controlled those specific flaws that could render the claim incorrect. If the inferential account cannot discern any flaws, performs the task poorly, or denies there can ever be errors, then it fails as an objective method of obtaining knowledge.
Consider a parallel with the problem of objectively interpreting observations: observations are always relative to the particular instrument or observation scheme employed. But we are often aware not only of the fact that observation schemes influence what we observe but also of how they influence observations and how much noise they are likely to produce so as to subtract them out. Hence, objective learning from observation is not a matter of getting free of arbitrary choices of instrument, but a matter of critically evaluating the extent of their influence to get at the underlying phenomenon.
For a similar analogy, the fact that my weight shows up as k pounds reflects the convention (in the United States) of using the pound as a unit of measurement on a particular type of scale. But given the convention of using this scale, whether or not my weight shows up as k pounds is a matter of how much I weigh!*
Likewise, the result of a statistical test is only partly determined by the specification of the tests (e.g., when a result counts as statistically significant); it is also determined by the underlying scientific phenomenon, at least as modeled. What enables objective learning to take place is the possibility of devising means for recognizing and effectively “subtracting out” the influence of test specifications, in order to learn about the underlying phenomenon, as modeled.
Focusing just on statistical inference, we can distinguish between an objective statistical inference, and an objective statistical method of inference. A specific statistical inference is objectively warranted, if it has passed a severe test; a statistical method is objective by being able to evaluate and control (at least approximately) the error probabilities needed for a severity appraisal. This also requires the method to communicate the information needed to conduct the error statistical evaluation (or report it as problematic).
It should be kept in mind that we are after the dual aims of severity and informativeness. Merely stating tautologies is to state objectively true claims, but they are not informative. But, it is vital to have a notion of objectivity, and we should stop feeling that we have to say, well there are objective and subjective elements in all methods; we cannot avoid dirty hands in discretionary choices of specification, so all inference methods do about as well when it comes to the criteria of objectivity. They do not.
*Which, in turn, is a matter of my having overeaten in London.
(1) If discretionary judgments are thought to introduce subjectivity in inference, a classic strategy thought to achieve objectivity is to extricate such choices, replacing them with purely formal a priori computations or agreed-upon conventions (see March 14). If leeway for discretion introduces subjectivity, then cutting off discretion must yield objectivity! Or so some argue. Such strategies may be found, to varying degrees, across the different approaches to statistical inference. The inductive logics of the type developed by Carnap promised to be an objective guide for measuring degrees of confirmation in hypotheses, despite much-discussed problems, paradoxes, and conflicting choices of confirmation logics. In Carnapian inductive logics, initial assignments of probability are based on a choice of language and on intuitive, logical principles. The consequent logical probabilities can then be updated (given the statements of evidence) with Bayes’s Theorem. The fact that the resulting degrees of confirmation are at the same time analytical and a priori—giving them an air of objectivity–reveals the central weakness of such confirmation theories as “guides for life”, e.g., —as guides, say, for empirical frequencies or for finding things out in the real world. Something very similar happens with the varieties of “objective’” Bayesian accounts, both in statistics and in formal Bayesian epistemology in philosophy (a topic to which I will return; if interested, see my RMM contribution). A related way of trying to remove latitude for discretion might be to define objectivity in terms of the consensus of a specified group, perhaps of experts, or of agents with “diverse” backgrounds. Once again, such a convention may enable agreement yet fail to have the desired link-up with the real world. It would be necessary to show why consensus reached by the particular choice of group (another area for discretion) achieves the learning goals of interest.
Likewise, routine and automatic choices in statistics can be justified as promoting a specified goal, but it is the onus of anyone supporting the account in question to show this.
(2) The second reaction is to acknowledge and even to embrace subjective and personal factors. For Savage (1964: 178) the fact that a subjective (which I am not here distinguishing from a “personalistic”) account restores the role of opinion in statistics was a cause of celebration. I am not sure if current-day subjective Bayesians concur—but I would like to hear from them. Underlying this second reaction, there is often a deep confusion between our limits in achieving the goal of adequately capturing a given data generating mechanism, and making the goal itself be to capture our subjective degrees of belief in (or about) the data generating mechanism. The former may be captured by severity assessments (or something similar), but these are not posterior probabilities (even if one grants the latter could be). Most importantly for the current issue, assessing the existing limitations and inadequacies of inferences is not the same as making our goal be to quantitatively model (our or someone else’s) degrees of belief! Yet these continue to be run together, making it easy to suppose that acknowledging the former limitation is tantamount to accepting the latter. As I noted in a March 14 comment to A. Spanos, “let us imagine there was a perfect way to measure a person’s real and true degrees of belief in a hypothesis (maybe with some neuropsychology development), while with frequentist statistical models, we grope our way and at most obtain statistically adequate representations of aspects of the data generating mechanism producing the relevant phenomenon. In the former [we are imagining], the measurement is 100% reliable, but the question that remains is the relevance of the thing being measured for finding out about the world. People seem utterly to overlook this” (at least when they blithely repeat variations on “arguments from discretion”, see March 14 post). Henry Kyburg (1992) put it in terms of error: the subjectivist precludes objectivity because they he or she cannot be in error:
This is almost a touchstone of objectivity: the possibility of error. There is no way I can be in error in my prior distribution for µ—unless I make a logical error. . . . It is that very fact that makes this prior distribution perniciously subjective. It represents an assumption that has consequences, but cannot be corrected by criticism or further evidence. (p. 147)
(3) The third way to deal with the challenges of objectivity in inference is to deliberately develop checks of error, and to insist that our statistical methods be self-correcting. Rather than expressing opinions, we want to avoid being misled by beliefs and opinions—mine and yours—building on the recognition that checks of error enable us to acquire reliable knowledge about the world. This third way is to discern what enabled us to reject the “dirty hands” argument: we can critically evaluate discretionary choices, and design methods to determine objectively what is and is not indicated. It may well mean that the interpretation of the data itself is a report of the obstacles to inference! Far from being a hodgepodge of assumptions and decisions, objectivity in inference can and should involve a systematic self-critical scrutiny all along the inferential path. Each stage of inquiry and each question within that stage involve potential errors and biases. By making these explicit we can learn despite background judgments. Nowadays, the reigning mood may be toward some sort of third way; but we must be careful. Merely rejecting the dirty-hands conclusion (as in my March 14 post) is not yet to show that any particular method achieves such objective scrutiny in given cases. Nor does it suffice to declare that “of course we subject our assumptions to stringent checks”, and “we will modify our models should we find misfits with the data”. We have seen in our posts on m-s tests, for instance, the dangers of “error fixing” strategies (M-S post 1, 2, 3, 4). The method for checking must itself be justified by showing it has the needed properties for pinpointing flaws reliably. It is not obvious that popular “third-way” gambits meet the error statistical requirements for objectivity in statistics that I have discussed in many previous posts and papers (the ability to evaluate and control relevant error probabilities). At least, it remains an open question as to whether they do. _____________
Carnap, R. (1962). Logical Foundations of Probability. Chicago: University of Chicago Press.
Kyburg, H. E., Jr. (1992). “The Scope of Bayesian Reasoning,” in D. Hull, M. Forbes, and K. Okruhlik (eds.), PSA 1992, Vol. II, East Lansing, MI: 139-52.
Savage, L. J. (1964). “The Foundations of Statistics Reconsidered,” pp. 173-188 in H. E. Kyburg and and H.E. Smokler (eds.), Studies in Subjective Probability, Wiley, New York: 173-88.