At the start of our seminar, I said that “on weekends this spring (in connection with Phil 6334, but not limited to seminar participants) I will post some of my ‘deconstructions‘ of articles”. I began with Andrew Gelman‘s note “Ethics and the statistical use of prior information”[i], but never posted my deconstruction of it. So since it’s Saturday night, and the seminar is just ending, here it is, along with related links to Stat and ESP research (including me, Jack Good, Persi Diaconis and Pat Suppes). Please share comments especially in relation to current day ESP research.
Let me begin with Gelman’s last section: “A Bayesian wants everybody else to be a non-Bayesian.” Surely that calls for philosophical deconstruction, if anything does. It seems at the very least an exceptional view. Whether it’s widely held I can’t say (please advise). But suppose it’s true: Bayesians are publicly calling on everybody to use Bayesian methods, even though, deep down, they really, really hope everybody else won’t blend everything together before they can use the valid parts from the data—and they really, really hope that everybody else will provide the full panoply of information about what happened in other experiments, and what background theories are well corroborated, and about the precision of the instruments relied upon, and about other experiments that appear to conflict with the current one and with each other, etc., etc. Suppose that Bayesians actually would prefer, and are relieved to find, that, despite their exhortations, “everybody else” doesn’t report their posterior probabilities (whichever version of Bayesianism they are using) because then they can introduce their own background and figure out what is and is not warranted (in whatever sense seems appropriate).
At first glance, I am tempted to say that I don’t think Gelman really believes this statement himself if it were taken literally. Since he calls himself a Bayesian, at least of a special sort, then if he is wearing his Bayesian hat when he advocates others be non-Bayesian, then the practice of advocating others be non-Bayesian would itself be a Bayesian practice (not a non-Bayesian practice). But we philosophers know the danger of suggesting that authors under our scrutiny do not mean what they say—we may be missing their meaning and interpreting their words in a manner that is implausible. Though we may think, through our flawed interpretation, that they cannot possibly mean what they say, what we have done is substitute a straw view for the actual view (the straw man fallacy). (Note: You won’t get that I am mirroring Gelman unless you look at the article that began this deconstruction here.)
Rule #2 of this blog[iii] is to interpret any given position in the most generous way possible; to do otherwise is to weaken our critical evaluation of it. This requires that we try to imagine a plausible reading, taking into account valid background information (e.g., other writings) that might bolster plausibility. This, at any rate, is what we teach in philosophy. So to begin with, what does Gelman actually say in the passage (in Section 4)?
“Bayesian inference proceeds by taking the likelihoods from different data sources and then combining them with a prior distribution (or, more generally, a hierarchical model). The likelihood is key. . . . No funny stuff, no posterior distributions, just the likelihood. . . . I don’t want everybody coming to me with their posterior distribution—I’d just have to divide away their prior distributions before getting to my own analysis. Sort of like a trial, where the judge wants to hear what everybody saw—not their individual inferences, but their raw data.” (p.5)
So if this is what he means by being a non-Bayesian, then his assertion that “a Bayesian wants everybody else to be a non-Bayesian” seems to mean that Bayesians want others to basically report their likelihoods. But again, if Gelman is wearing his Bayesian hat when he advocates others not wear theirs, i.e., be non-Bayesian, then his advising that everybody else not be Bayesian (in the sense of not combining priors and likelihoods), is itself a Bayesian practice (not a non-Bayesian practice). So either Gelman is not wearing his Bayesian hat when he recommends this, or his claim is self-contradictory—and I certainly do not want to attribute an inconsistent position to him. Moreover, I am quite certain that he would not advance any such inconsistent position.
Now, I do have some background knowledge. To ignore it is to fail to supply the most generous interpretation. Our background information—that is, Gelman’s (2011) RMM paper [iv]—tells me that he rejects the classic inductive philosophy that he has (correctly) associated with the definition of Bayesianism found on Wikipedia:
“Our key departure from the mainstream Bayesian view (as expressed, for example, [in Wikipedia]) is that we do not attempt to assign posterior probabilities to models or to select or average over them using posterior probabilities. Instead, we use predictive checks to compare models to data and use the information thus learned about anomalies to motivate model improvements” (p. 71).
So now Gelman’s assertion that “a Bayesian wants everybody else to be a non-Bayesian” makes sense and is not self-contradictory. Bayesian, in the term non-Bayesian, would mean something like a standard inductive Bayesian (where priors can be subjective or non-subjective). Gelman’s non-standard Bayesian wants everybody else not to be standard inductive Bayesians, but rather, something more akin to a likelihoodist. (I don’t know whether he wants only the likelihoods rather than the full panoply of background information, but I will return to this.) If Gelman’s Bayesian is not going to assign posterior probabilities to models, or select or average over them using posterior probabilities, then it’s pretty clear he will not find it useful to hear a report of your posterior probabilities. To allude to his trial analogy, the judge surely doesn’t want to hear your posterior probability in Ralph’s guilt, if he doesn’t even think it’s the proper way of couching inferences. Perhaps the judge finds it essential to know whether mistaken judgments of the pieces of evidence surrounding Ralph’s guilt have been well or poorly ruled out.That would be to require an error probabilistic assessment.
But a question might be raised: By “a Bayesian,” doesn’t Gelman clearly mean Bayesians in general, and not just one? And if he means all Bayesians, it would be wrong to think, as I have, that he was alluding to non-standard Bayesians (i.e., those wearing a hat of which Gelman approves). But there is no reason to suppose he means all Bayesians rather than all Bayesians who reject standard, Wiki-style Bayesianism, but instead favor something closer to the view in Gelman 2011, among other places.
Having gotten this far, however, I worry about using the view in Gelman 2011 to deconstruct the passages in the current article, in which, speaking of a Bayesian combining prior distributions and likelihoods, Gelman sounds more like a standard Bayesian. It would not help that he may be alluding to Bayesians in general for purposes of the article, because it is in this article that we find the claim: “A Bayesian wants everybody else to be a non-Bayesian.” So despite my attempts to sensibly deconstruct him, it appears that we are back to the initial problem, in which his claim that a Bayesian wants everybody else to be a non-Bayesian looks self-contradictory or at best disingenuous—and this in a column on ethics in statistics!
But we are not necessarily led to that conclusion! I’ve pasted part 2 below, and here’s a link to part 3…..
[i]Gelman, A. “Ethics and the statistical use of prior information”
[ii] The main posts, following the first one, were:
More on using background info (9/15/12)
Statistics and ESP research (Diaconis) (9/22/12)
Insevere tests and pseudoscience (9/25/12)
Levels of inquiry (9/26/12)
[iii] This the Philosopher’s rule of “generous interpretation”, first introduced in this post.
[iv] Gelman, A. (2011). “Induction and Deduction in Bayesian Data Analysis“, Rationality, Markets, and Morals (RMM) 2, 67-78.
A Bayesian, Gelman tells us, “wants everybody else to be a non-Bayesian” (p. 5). Despite appearances, the claim need not be seen as self-contradictory, at least if we interpret it most generously, as Rule #2 (of this blog) directs. Whether or not “a Bayesian” refers to all Bayesians or only non-standard Bayesians (i.e., those wearing a hat of which Gelman approves), his meaning might be simply that when setting out with his own inquiry, he doesn’t want your favorite priors (be that beliefs or formally derived constructs) getting in the way. A Bayesian, says Gelman (in this article) is going to make inferences based on “trying to extract information from the data” in order to determine what to infer or believe (substitute your preferred form of output) about some aspect of a population (or mechanism) generating the data, as modeled. He just doesn’t want the “information from the data” muddied by your particular background knowledge. He would only have to subtract out all of this “funny business” to get at your likelihoods. He would only have to “divide away” your prior distributions before getting to his own analysis (p. 5). As in Gelman’s trial analogy (p. 5.), he prefers to combine your “raw data,” and your likelihoods, with his own well-considered background information. We can leave open whether he will compute posteriors (at least in the manner he recommends here) or not (as suggested in other work). So perhaps we have arrived at a sensible deconstruction of Gelman, free of contradiction. Whether or not this leaves texts open to some charge of disingenuity, I leave entirely to one side.
Now at this point I wonder: do Bayesian reports provide the ingredients for such “dividing away”? I take it that they’d report the priors, which could be subtracted out, but how is the rest of the background knowledge communicated and used? It would seem to include assorted background knowledge of instruments, of claims that had been sufficiently well corroborated to count as knowledge, of information about that which was not previously well tested, of flaws and biases and threats of error to take into account in future designs, etc. (as in our ESP examples 9/22 and 9/25). The evidence for any background assumptions should also be made explicit and communicated (unless it consists of trivial common knowledge).
Doesn’t Gelman’s Bayesian want all this as well? What form would all this background information take?
I see no reason to (and plenty of reasons not to) suppose that all the relevant background information for scientific inquiry enters by means of formal prior probability distributions, whether the goal is to interpret what this data, say x, indicate, or to make a more general inference given all the relevant background knowledge in science at the time, say, E. How much less so if one is not even planning to report posterior probabilities. Background information of all types enters in qualitatively to arrive at a considered judgment of what is known, and not known about the question of interest, and what subsequent fruitful questions might be.
In my own take on these matters, even in cases of statistical inference, it is useful to distinguish a minimum of three models (or sets of questions or the like), which I sometimes identify as the (primary) theoretical, statistical, and data models. Recall the “three levels” in my earlier post. If one is reporting what data x from a given study indicate about the question of interest, one may very likely report something different than when reporting, say, what all available evidence E indicate or warrant. I concur with Gelman on this. Background information enters in specifying the problem, collecting, and modeling data; drawing statistical inferences from modeled data, and in linking statistical inferences to substantive scientific questions. There is no one order either—it’s more of a cyclical arrangement.
Does Gelman agree? I am guessing he would, with the possible exception of the role of a Bayesian prior, if any, in his analysis, for purposes of injecting background information. But I’m too unclear on this to speculate.
To this same, large extent, Gelman’s view on the proper entry of background knowledge is in sync with Sir David Cox’s position; although for a minute I thought Gelman was disagreeing with Cox (about background information), this analysis suggests not. Beyond what might be extracted from the snippet from the informal (Cox-Mayo) exchange, to which Gelman refers (p. 3), Cox has done at least as much as anyone else I can think of to show us how we might generate, systematize, and organize background information, and how to establish the criteria appropriate for evaluating such information.[i]
But maybe the concordance is not all as rosy as I am suggesting here. After all, in the same article, Gelman gives a convincing example of using background information which leads him to ask:
“Where did Fisher’s principle go wrong here? (p. 3)”
To be continued . . .in part 3.
Check the comments from these earlier posts for some good discussion.
[i]I give just one old and one new reference:
Cox, D. R., (1958), Planning of Experiments. New York: John Wiley and Sons. (1992 Republished by Wiley Classics Library Edition.)
Cox, D. R., and C. A. Donnelly (2011), Principles of Applied Statistics, Cambridge: Cambridge University Press.
Gelman’s position seems to be that he doesn’t trust others to put in their priors and would rather put them in himself. OK. This suggests the data report would be likelihoods, and each person then puts in her prior, or maybe each person goes to a Gelman for his prior.
In retrospect I did not express myself clearly. Best would be for each researcher to present his or her full data collection procedure as well as the data themselves, and also his or her model, giving justifications for each step (not just for the “prior” but also for the data model).
My real point (which, again, I did not express so clearly) is that if people just present their inferences without clarity on what information has gone into these inferences, there can be trouble if I want to combine information from different sources. There are settings where the data model is clear and the data information cleanly separates into factors of a likelihood, and that’s what I was thinking of when I spoke of “the likelihood.” But I agree with you that the likelihood function is not in general enough, for one reason because other aspects of the data collection (for example, the stopping rule) can also be relevant and also, more importantly, there in general is not “the likelihood” as this “likelihood” typically includes lots of modeling assumptions which should be understood and justified.
I assume you’re joking when you suggests that a person “goes to a Gelman for his prior.” As I’ve said so many times, there’s nothing so special about the part of the model we call a “prior,” especially in hierarchical models. I do think that “a Gelman” (or should we say “a gelman”) can be helpful in formulating one’s models, just because gelmans do have some experience with such things. My books include much, but not all, of my understanding of statistical modeling.
Finally, I hope your readers will realize that the article you link to has four parts, and your discussion is all about part 4, which happens to be the shortest section of the paper!