Andrew Gelman says that as a philosopher, I should appreciate his blog today in which he records his frustration: “Against aggressive definitions: No, I don’t think it helps to describe Bayes as ‘the analysis of subjective beliefs’…” Gelman writes:
I get frustrated with what might be called “aggressive definitions,” where people use a restrictive definition of something they don’t like. For example, Larry Wasserman writes (as reported by Deborah Mayo):
“I wish people were clearer about what Bayes is/is not and what frequentist inference is/is not. Bayes is the analysis of subjective beliefs but provides no frequency guarantees. Frequentist inference is about making procedures that have frequency guarantees but makes no pretense of representing anyone’s beliefs.”
I’ll accept Larry’s definition of frequentist inference. But as for his definition of Bayesian inference: No no no no no. The probabilities we use in our Bayesian inference are not subjective, or, they’re no more subjective than the logistic regressions and normal distributions and Poisson distributions and so forth that fill up all the textbooks on frequentist inference.
To quickly record some of my own frustrations:*: First, I would disagree with Wasserman’s characterization of frequentist inference, but as is clear from Larry’s comments to (my reaction to him), I think he concurs that he was just giving a broad contrast. Please see Note  for a remark from my post: Comments on Wasserman’s “what is Bayesian/frequentist inference?” Also relevant is a Gelman post on the Bayesian name: .
Second, Gelman’s “no more subjective than…” evokes remarks I’ve made before. For example, in “What should philosophers of science do…” I wrote:
Arguments given for some very popular slogans (mostly by non-philosophers), are too readily taken on faith as canon by others, and are repeated as gospel. Examples are easily found: all models are false, no models are falsifiable, everything is subjective, or equally subjective and objective, and the only properly epistemological use of probability is to supply posterior probabilities for quantifying actual or rational degrees of belief. Then there is the cluster of “howlers” allegedly committed by frequentist error statistical methods repeated verbatim (discussed on this blog).
I’ve written a lot about objectivity on this blog, e.g., here, here and here (and in real life), but what’s the point if people just rehearse the “everything is a mixture…” line, without making deeply important distinctions? I really think that, next to the “all models are false” slogan, the most confusion has been engendered by the “no methods are objective” slogan. However much we may aim at objective constraints, it is often urged, we can never have “clean hands” free of the influence of beliefs and interests, and we invariably sully methods of inquiry by the entry of background beliefs and personal judgments in their specification and interpretation.
There are indeed numerous choices in collecting, analyzing, modeling, and drawing inferences from data, and in determining what inferences they warrant about scientific claims of interest. But why suppose that this introduces subjectivity into an account, or worse, means that all accounts are in the same boat as regards subjective factors? They most certainly are not. An account of inference shows itself to be objective precisely in how it steps up to the plate in handling these choices and methodological decisions.
While it is obvious that human judgments and human measurements are involved, it is too trivial an observation to help us distinguish among the very different ways judgments should enter, and how threats of bias and unwarranted inferences may nevertheless be avoided. The issue is not that a human is doing the measuring, the issue is whether what is being measured is something we can reliably use to find things out, i.e., solve some problem of inquiry. This last sentence needs unpacking; there are three distinct points.
(1) Relevance: The process should be relevant to learning what is being measured. Having an uncontroversial way to measure something is not enough to make it relevant to solving a knowledge-based problem of inquiry.
(2) Reliably capable: The process should not routinely or often declare a problem solved when it is not (or solved incorrectly)–whatever the nature of the problem. The process should be capable of controlling reports of erroneous solutions to problems with some reliability.
(3) Able to learn from error: If the problem is not solved (or poorly solved) at a given stage, the method will enable pinpointing the reason why, or set the stage for finding this out.
I think it is point (1) that is overlooked by many, and I would like to call your attention to it. It is common for “conventional” or , as many of them prefer, “O-Bayesians” (e.g., Bernardo) to declare that their methods are just as objective as frequentist methods because they (only!) assume the statistical model and the data. For example, here. Point (2) of course, gets at severity, and (3) directly ties to what I have called solving Duhemian problems. The issue, as I see it, is not a matter of dubbing all of a domain objective or not, but of clarifying the criteria to discern the truth about some criticisms. For starters, why should the fact of discretionary choices show methods fall down on these jobs of objective inquiry?
We have discussed the dirty hands argument a few times on this blog. (See for example Objectivity (#4) and the “Argument From Discretion” ). There are a number of ways in which the argument takes root—all, I claim are fallacious.To try to give these arguments a run for their money, I’ve tried to see why they look so plausible. One route is to view the reasoning as follows:
- A variety of human judgments go into specifying experiments, tests, and models.
- Because there is latitude and discretion in these specifications, they are “subjective”.
- Whether data are taken as evidence for a statistical hypothesis or model depends on these subjective methodological choices.Therefore statistical inference and modeling is invariably subjective, if only in part.
To avoided loaded terms, call the methodological choices “discretionary”. Because of the discretionary choices in inquiry, we invariably get our hands dirty. Therefore, our conclusions cannot be pristine or objective.
The discretionary choice of a very insensitive test for detecting a positive discrepancy d’ from a 0 null, for example, results in a test with low probability of finding statistical significance even if a discrepancy as large as d’, exists. But that does not prevent me from determining, objectively, that an insignificant difference in that test fails to warrant inferring the discrepancy is less than d’. The inference would pass with low severity. We call this identifying a fallacy of negative or insignificant results. But notice: it’s error statistical reasoning that enters to correct an application of an error statistical method. These methods are self-correcting!
In this connection, see the blogpost (in which Gelman also figures): P-values can’t be trusted except when used to argue that P-values can’t be trusted!
Accounts that boast great flexibility and latitude do not enjoy this self-critical feature. Ironically, critics often make use of error-statistical reasoning in making out their criticisms of error statistical methods, while at the same time endorsing methods whose great flexibility and latitude frees them from error statistical constraints!
And what really frustrates me is the confusion (subliminal perhaps) between an account that recognizes how biases can color inference, and one that allows biases to enter into the inference. This calls to mind the discussion that sprung up in relation to Nate Silver. Whether or not he meant it, what he said, and said more than once, is that we should be Bayesian because it lets us explicitly introduce our biases into the data analysis! See for example: (8/6) What did Nate Silver just say? Blogging the JSM (8/9) 11th bullet, multiple choice question, and last thoughts on the JSM.
We take a different tact. Your analysis might be a product of your desire not to find an effect, of insufficient funds to collect a larger sample, of ethics, or of the inadvertent choice of a bureaucrat. But my critical evaluation of what the resulting data do and do not indicate need not itself be a matter of desires, economics, ethics, or what have you. If I were not skeptical enough already, knowledge of a researcher’s self-interest in a result may well motivate me to scrutinize his claims all the more, but that reflects a distinct interest—an interest in not being misled, an interest in finding out what the case is, and others of an epistemic nature.
Note: there’s a big difference here—typically overlooked– between using “background beliefs” to alter an inference, and “using” them as a motivation to scrutinize an inference. I had a long exchange with Gelman once on this issue. He was criticizing frequentists for ignoring background, and I don’t think I got through to him that we (error statisticians) use background to scrutinize (and improve) all stages of inquiry. Without recognizing this, he perhaps inadvertently adds fuel to the fire against the frequentist error statistician’s use of background to promote objective scrutiny. See especially,”How should prior information enter in statistical inference”, “background knowledge: not to quantify but to avoid being misled by subjective beliefs”, and one of my deconstructions of Gelman: “Last part (3) of the deconstruction: beauty and background knowledge”.
There are parallels between learning from statistical experiments and learning from observations in general. The problem in objectively interpreting observations is that observations are always relative to the particular instrument or observation scheme employed. But we are often aware not only of the fact that observation schemes influence what we observe but also of how. How much noise are they likely to produce and how might we subtract them out. That’s the core strength of the error statistical approach.
The result of a statistical test need only be partly determined by the specification of the categories (e.g., when a result counts as statistically significant); it is also determined by the underlying scientific phenomenon, as modeled. What enables objective learning to take place is the possibility of devising means for taking account of the influence of test specifications. Frequentist error probabilities enable us to do this by letting us evaluate and control the capabilities of our tools to find flaws in attempted solutions to problems. That’s the basis for ensuring that before inferring a claim H, we have not only “sincerely tried to find flaws” (as Popper put it), but that we have successfully probed them. Any statistical account which cannot make use of error probabilities associated with methods is one that forfeits this critical self-control.
I wrote: “But I do have serious concerns that in his understandable desire (1) to be even-handed (hammers and screwdrivers are for different purposes, both perfectly kosher tools), as well as (2) to give a succinct sum-up of methods, Wasserman may encourage misrepresenting positions. Speaking only for “frequentist” sampling theorists, I would urge moving away from the recommended quick sum-up of “the goal” of frequentist inference: “Construct procedures with frequency guarantees”. If by this Wasserman means that the direct aim is to have tools with “good long run properties”, that rarely err in some long run series of applications, then I think it is misleading. In the context of scientific inference or learning, such a long-run goal, while necessary is not at all sufficient; moreover, I claim, that satisfying this goal is actually just a byproduct of deeper inferential goals (controlling and evaluating how severely given methods are capable of revealing/avoiding erroneous statistical interpretations of data in the case at hand.) (So I deny that it is even the main goal to which frequentist methods direct themselves.) Even arch behaviorist Neyman used power post-data to ascertain how well corroborated various hypotheses were—never mind long-run repeated applications (see one of my Neyman’s Nursery posts).”
 See also Gelman’s post, “What is a Bayesian”: http://andrewgelman.com/2012/07/31/what-is-a-bayesian/
*I’m posting this quickly for timeliness; I’m bound to make corrections. If significant, I’ll call it draft (ii).