We constantly hear that procedures of inference are inescapably subjective because of the latitude of human judgment as it bears on the collection, modeling, and interpretation of data. But this is seriously equivocal: Being the product of a human subject is hardly the same as being subjective, at least not in the sense we are speaking of—that is, as a threat to objective knowledge. Are all these arguments about the allegedly inevitable subjectivity of statistical methodology rooted in equivocations? I argue that they are! *[This post combines this one and this one, as part of our monthly “3 years ago” memory lane.]*

“Argument from Discretion” (dirty hands)

Insofar as humans conduct science and draw inferences, it is obvious that human judgments and human measurements are involved. True enough, but too trivial an observation to help us distinguish among the different ways judgments should enter, and how, nevertheless, to avoid introducing bias and unwarranted inferences. The issue is not that a human is doing the measuring, but whether we can reliably use the thing being measured to find out about the world.

Remember the dirty-hands argument? In the early days of this blog (e.g., October 13, 16), I deliberately took up this argument as it arises in evidence-based policy because it offered a certain clarity that I knew we would need to come back to in considering general “arguments from discretion”. To abbreviate:

- Numerous human judgments go into specifying experiments, tests, and models.
- Because there is latitude and discretion in these specifications, they are “subjective.”
- Whether data are taken as evidence for a statistical hypothesis or model depends on these subjective methodological choices.
- Therefore, statistical inference and modeling is invariably subjective, if only in part.

We can spot the fallacy in the argument much as we did in the dirty hands argument about evidence-based policy. It is true, for example, that by employing a very insensitive test for detecting a positive discrepancy d’ from a 0 null, that the test has low probability of finding statistical significance even if a discrepancy as large as d’ exists. But that doesn’t prevent us from determining, objectively, that an insignificant difference from that test fails to warrant inferring evidence of a discrepancy less than d’.

Test specifications may well be a matter of personal interest and bias, but, given the choices made, whether or not an inference is warranted is not a matter of personal interest and bias. Setting up a test with low power against d’ might be a product of your desire not to find an effect for economic reasons, of insufficient funds to collect a larger sample, or of the inadvertent choice of a bureaucrat. Or ethical concerns may have entered. But none of this precludes our critical evaluation of what the resulting data do and do not indicate (about the question of interest). The critical task need not itself be a matter of economics, ethics, or what have you. Critical scrutiny of evidence reflects an interest all right—an interest in not being misled, an interest in finding out what the case is, and others of an epistemic nature.

Objectivity in statistical inference, and in science more generally, is a matter of being able to critically evaluate the warrant of any claim. This, in turn, is a matter of evaluating the extent to which we have avoided or controlled those specific flaws that could render the claim incorrect. If the inferential account cannot discern any flaws, performs the task poorly, or denies there can ever be errors, then it fails as an objective method of obtaining knowledge.

Consider a parallel with the problem of objectively interpreting observations: observations are always relative to the particular instrument or observation scheme employed. But we are often aware not only of the fact that observation schemes influence what we observe but also of how they influence observations and how much noise they are likely to produce so as to subtract them out. Hence, objective learning from observation is not a matter of getting free of arbitrary choices of instrument, but a matter of critically evaluating the extent of their influence to get at the underlying phenomenon.

For a similar analogy, the fact that my weight shows up as k pounds reflects the convention (in the United States) of using the pound as a unit of measurement on a particular type of scale. But given the convention of using this scale, whether or not my weight shows up as k pounds is* a matter of how much I weigh!**

Likewise, the result of a statistical test is only partly determined by the specification of the tests (e.g., when a result counts as statistically significant); it is also determined by the underlying scientific phenomenon, at least as modeled. What enables objective learning to take place is the possibility of devising means for recognizing and effectively “subtracting out” the influence of test specifications, in order to learn about the underlying phenomenon, as modeled.

Focusing just on statistical inference, we can distinguish between an objective statistical inference, and an objective statistical method of inference. *A specific statistical inference is objectively warranted, if it has passed a severe test; a statistical method is objective by being able to evaluate and control (at least approximately) the error probabilities needed for a severity appraisal*. This also requires the method to communicate the information needed to conduct the error statistical evaluation (or report it as problematic).

It should be kept in mind that we are after the dual aims of severity and informativeness. Merely stating tautologies is to state objectively true claims, but they are not informative. But, it is vital to have a notion of objectivity, and we should stop feeling that we have to say, well there are objective and subjective elements in all methods; we cannot avoid dirty hands in discretionary choices of specification, so all inference methods do about as well when it comes to the criteria of objectivity. They do not.

*Which, in turn, is a matter of my having overeaten in London.

__________________

3 Reactions to the Challenge of Objectivity

(1) If discretionary judgments are thought to introduce subjectivity in inference, a classic strategy thought to achieve objectivity is to extricate such choices, replacing them with purely formal *a priori* computations or agreed-upon conventions (see March 14). *If leeway for discretion introduces subjectivity, then cutting off discretion must yield objectivity!* Or so some argue. Such strategies may be found, to varying degrees, across the different approaches to statistical inference. The inductive logics of the type developed by Carnap promised to be an objective guide for measuring degrees of confirmation in hypotheses, despite much-discussed problems, paradoxes, and conflicting choices of confirmation logics. In Carnapian inductive logics, initial assignments of probability are based on a choice of language and on intuitive, logical principles. The consequent *logical probabilities* can then be updated (given the statements of evidence) with Bayes’s Theorem. The fact that the resulting degrees of confirmation are at the same time analytical and a priori—giving them an air of objectivity–reveals the central weakness of such confirmation theories as “guides for life”, e.g., —as guides, say, for empirical frequencies or for finding things out in the real world. Something very similar happens with the varieties of “objective’” Bayesian accounts, both in statistics and in formal Bayesian epistemology in philosophy (a topic to which I will return; if interested, see my RMM contribution). A related way of trying to remove latitude for discretion might be to define objectivity in terms of the consensus of a specified group, perhaps of experts, or of agents with “diverse” backgrounds. Once again, such a convention may enable agreement yet fail to have the desired link-up with the real world. It would be necessary to show why consensus reached by the particular choice of group (another area for discretion) achieves the learning goals of interest.

Likewise, routine and automatic choices in statistics can be justified as promoting a specified goal, but it is the onus of anyone supporting the account in question to show this.

(2) The second reaction is to acknowledge and even to embrace subjective and personal factors. For Savage (1964: 178) the fact that a subjective (which I am not here distinguishing from a “personalistic”) account restores the role of opinion in statistics was a cause of celebration. I am not sure if current-day subjective Bayesians concur—but I would like to hear from them. Underlying this second reaction, there is often a deep confusion between our limits in achieving the goal of adequately capturing a given data generating mechanism, and making the goal *itself* *be* to capture our subjective degrees of belief in (or about) the data generating mechanism. The former may be captured by severity assessments (or something similar), but these are not posterior probabilities (even if one grants the latter could be). Most importantly for the current issue, assessing the existing limitations and inadequacies of inferences is *not* the same as making our goal *be* to quantitatively model (our or someone else’s) degrees of belief! Yet these continue to be run together, making it easy to suppose that acknowledging the former limitation is tantamount to accepting the latter. As I noted in a March 14 comment to A. Spanos, “let us imagine there was a perfect way to measure a person’s real and true degrees of belief in a hypothesis (maybe with some neuropsychology development), while with frequentist statistical models, we grope our way and at most obtain statistically adequate representations of aspects of the data generating mechanism producing the relevant phenomenon. In the former [we are imagining], the measurement is 100% reliable, but the question that remains is the relevance of the thing being measured for finding out about the world. People seem utterly to overlook this” (at least when they blithely repeat variations on “arguments from discretion”, see March 14 post). Henry Kyburg (1992) put it in terms of error: the subjectivist precludes objectivity because they he or she cannot be in error:

This is almost a touchstone of objectivity: the possibility of error. There is no way I can be in error in my prior distribution for µ—unless I make a logical error. . . . It is that very fact that makes this prior distribution perniciously subjective. It represents an assumption that has consequences, but cannot be corrected by criticism or further evidence. (p. 147)

(3) The third way to deal with the challenges of objectivity in inference is to deliberately develop checks of error, and to insist that our statistical methods be self-correcting. Rather than expressing opinions, we want to avoid being misled by beliefs and opinions—mine and yours—building on the recognition that checks of error enable us to acquire reliable knowledge about the world. This third way is to discern what enabled us to reject the “dirty hands” argument: we can critically evaluate discretionary choices, and design methods to determine objectively what is and is not indicated. It may well mean that the interpretation of the data itself is a report of the obstacles to inference! Far from being a hodgepodge of assumptions and decisions, objectivity in inference can and should involve a systematic self-critical scrutiny all along the inferential path. Each stage of inquiry and each question within that stage involve potential errors and biases. By making these explicit we can learn despite background judgments. Nowadays, the reigning mood may be toward some sort of third way; but we must be careful. Merely rejecting the dirty-hands conclusion (as in my March 14 post) is not yet to show that any particular method achieves such objective scrutiny in given cases. Nor does it suffice to declare that “of course we subject our assumptions to stringent checks”, and “we will modify our models should we find misfits with the data”. We have seen in our posts on m-s tests, for instance, the dangers of “error fixing” strategies (M-S post 1, 2, 3, 4). The method for checking must itself be justified by showing it has the needed properties for pinpointing flaws reliably. It is not obvious that popular “third-way” gambits meet the error statistical requirements for objectivity in statistics that I have discussed in many previous posts and papers (the ability to evaluate and control relevant error probabilities). At least, it remains an open question as to whether they do. _____________

Carnap, R. (1962). *Logical Foundations of Probability*. Chicago: University of Chicago Press.

Kyburg, H. E., Jr. (1992). “The Scope of Bayesian Reasoning,” in D. Hull, M. Forbes, and K. Okruhlik (eds.), *PSA** 1992*, Vol. II, East Lansing, MI: 139-52.

Savage, L. J. (1964). “The Foundations of Statistics Reconsidered,” pp. 173-188 in H. E. Kyburg and and H.E. Smokler (eds.), *Studies in Subjective **Probability*, Wiley, New York: 173-88.

As you may have expected, I find this extremely interesting.

I’m all for critical scrutiny of the evidence; I’m happy with most if not all that you advertise in the name of objectivity.

Still, your text to me suggests a certain over-optimism regarding how far we can get “subtracting out” the effects of discretionary choices.

I think that it is a legitimate use (one of the many possible and partly contradictory ones that one can find in the literature) of the term “objective” to say that it is objective what for example the result of a certain t-test implies and does not imply including an acknowledgement that this is based on the assumption of i.i.d. normality, which itself has to be open to critical scrutiny. However, there is no way, using methodology in a so-called “objective” way to distinguish normality and independence from every possible alternative that could potentially lead to substantially different conclusions regarding the underlying reality. At some point we always need to accept working with such a model having tested it in some ways and being not able to test it in some others. Where does this stand in your coordinate system of using the terminologies “objective” and “subjective”?

There are certain possibilities even to analyse this, what potential problems it involves and how some of these (but not them all) can be dealt with, but often such possibilities (as in the robust statistics literature) come with the need of making further discretionary choices, tuning constants and the like.

Personally, although I believe that you use the term “objective” in a consistent and legitimate way, I do believe that advertising objectivity in over-optimistic ways is very problematic in science and that there is a big problem with people trying to *appear* objective by hiding decisions from scrutiny some of which may be arbitrary, some of which may be sensible in their specific context although not generalisable, and some others of which may be absolutely required to get the statistical machinery (frequentist or Bayesian or whatever) going in the first place.

I also believe that the wording “getting your hands dirty” conveys a bad message, namely that either people should clean their hands (i.e., appearing not to make decisions at all), or that once the hand are dirty, everything is possible – and arbitrary. As far as I understand you, you object against this idea as I do, but wouldn’t it then be helpful to have more positive and constructive ideas about how and why to make such decisions, and to encourage scientists to be open and honest about them? But this would require an acknowledgement of the benefits and necessities of such decisions and the rationales behind them, rather than talking exclusively about the “objective” side of things, only grudgingly accepting that discretionary decisions exist but should be “substracted out” as far as we can, wouldn’t it?

What was my text to you? These are reblogs from the “objectivity” series (of 5 or more) from 3 years ago. Interestingly, I didn’t find there’s much I’d change, except that more needs to be said about the epistemology and metaphysics behind many existing positions on objectivity. The “dirty hands” analogy is one used by others in the risk assessment context to claim “we all have dirty hands”, thus we all are biased, thus scientists should bias their reports in favor of the common good (this was the ethics in evidence post, and this argument is more prevalent than ever.) I am arguing AGAINST the dirty hands allegation, so I am agreeing with you that discretionary choices don’t or needn’t dirty our hands.

The “dirty hands” post:

https://errorstatistics.com/2011/10/13/objectivity-2-the-dirty-hands-argument-for-ethics-in-evidence/

Mayo:

We all busy with too many opportunities to read too many things.

But Christian’s recent paper with Andrew addresses many issue of objectivity very nicely.

http://www.stat.columbia.edu/~gelman/research/unpublished/objectivity10.pdf

In that paper they reference Hasok Chang who seems very consistent with Peirce whom he seems to have gotten mostly from Amy McLaughlin (have to wait until I am at a library to access her work) and John Dewey – so you might be especially interested.

Keith: Yes, I’ve written very lengthy and detailed comments on that paper. At some point, once cleaned up and provided the authors concur, I can post them. It just so happened to be a topic that coincided with my (end of the month) three-year monthly memory lane routine.

I also know Chang well–a fellow new experimentalist who was one of the people I invited to my Lakatos Prize dinner. (I guess you’re allowed to invite some very tiny number like 3). He had published, in Nature, an early and very favorable review of EGEK (1996).

Two questions about objectivity.

First, Bayes has the property that you can subdivide the data/evidence in any way you want, and process it any order you want and you’ll get the same final answer. Frequentists methods do not have that property. It’s possible for two different Frequentists for example to get difference answers just because they did the same group of tests/severity analysis in different orders. Isn’t it important that methods be “objective” in this way, or do you think it’s acceptable that irrelevant choices by the analyst can change the import of the same data/evidence?

A follow on questions: if being “objective” in this way is important, why can’t we use this a (mathematical) requirement to limit acceptable procedures? Do you have any idea what formalism you’ll mathematically be lead to if you start imposing these kinds of objectivity requirements?

Second, you say this about logical probabilities:

“The fact that the resulting degrees of confirmation are at the same time analytical and a priori—giving them an air of objectivity–reveals the central weakness of such confirmation theories as “guides for life”, e.g., —as guides, say, for empirical frequencies or for finding things out in the real world.”

For logical probabilities the idea is that P(A|B) represents a model of the uncertainty in A from partial evidence B. The form of P(A|B) then follows logically and objectively from B. “A” could be parameters predicted or unknown or could be a frequency. For example, A could be the percentage of heads in the next 1000 flips of a coin. So where in the world did you get the idea that P(A|B) couldn’t be a “guide for life” because it’s “logical” and “objective”?

BFL: Yes, Bayesians say they can toss things around, but many also say the prior is supposed to be before the data and not based on the data–warning against double counting. So their position is a bit unclear–doubtless it shifts for different Bayesians.

If you’re alluding, at the start, to the fact that error statisticians violate the strong likelihood principle and that selection effects, stopping rules, multiple testing etc. alter error probabilities, then I say “guilty as charged”. That’s not irrelevant info for us.

I mentioned the general key question raised of logical probabilities based on formal (first order) languages. A research programme essentially abandoned, but surely some still pursue it. One needs to set out all properties, possibly relations (I don’t think they’ve advanced to functions), individual entities, and then choose a “uniformity factor” lambda, and then a way to give initial assignments to states, or structures, or what have you.

What you describe is in sync with having statements be statistical models, or in any event empirical and not purely formal, syntactical context-free claims. syntactic approach.