Deconstructing Gelman part 2: Using prior Information

(Please see part 1 for links and references):

A Bayesian, Gelman tells us, “wants everybody else to be a non-Bayesian” (p. 5). Despite appearances, the claim need not be seen as self-contradictory, at least if we interpret it most generously, as Rule #2 (of this blog) directs. Whether or not “a Bayesian” refers to all Bayesians or only non-standard Bayesians (i.e., those wearing a hat of which Gelman approves), his meaning might be simply that when setting out with his own inquiry, he doesn’t want your favorite priors (be that beliefs or formally derived constructs) getting in the way. A Bayesian, says Gelman (in this article) is going to make inferences based on “trying to extract information from the data” in order to determine what to infer or believe (substitute your preferred form of output) about some aspect of a population (or mechanism) generating the data, as modeled. He just doesn’t want the “information from the data” muddied by your particular background knowledge. He would only have to subtract out all of this “funny business” to get at your likelihoods. He would only have to “divide away” your prior distributions before getting to his own analysis (p. 5). As in Gelman’s trial analogy (p. 5.), he prefers to combine your “raw data,” and your likelihoods, with his own well-considered background information. We can leave open whether he will compute posteriors (at least in the manner he recommends here) or not (as suggested in other work). So perhaps we have arrived at a sensible deconstruction of Gelman, free of contradiction. Whether or not this leaves texts open to some charge of disingenuity, I leave entirely to one side.

Now at this point I wonder: do Bayesian reports provide the ingredients for such “dividing away”?  I take it that they’d report the priors, which could be subtracted out, but how is the rest of the background knowledge communicated and used? It would seem to include assorted background knowledge of instruments, of claims that had been sufficiently well corroborated to count as knowledge, of information about that which was not previously well tested, of flaws and biases and threats of error to take into account in future designs, etc. (as in our ESP examples 9/22 and 9/25). The evidence for any background assumptions should also be made explicit and communicated (unless it consists of trivial common knowledge).

Doesn’t Gelman’s Bayesian want all this as well? What form would all this background information take? 

I see no reason to (and plenty of reasons not to) suppose that all the relevant background information for scientific inquiry enters by means of formal prior probability distributions, whether the goal is to interpret what this data, say x, indicate, or to make a more general inference given all the relevant background knowledge in science at the time, say, E. How much less so if one is not even planning to report posterior probabilities. Background information of all types enters in qualitatively to arrive at a considered judgment of what is known, and not known about the question of interest, and what subsequent fruitful questions might be.

In my own take on these matters, even in cases of statistical inference, it is useful to distinguish a minimum of three models (or sets of questions or the like), which I sometimes identify as the (primary) theoretical, statistical, and data models. Recall the  “three levels” in my earlier post. If one is reporting what data x from a given study indicate about the question of interest, one may very likely report something different than when reporting, say, what all available evidence E indicate or warrant. I concur with Gelman on this. Background information enters in specifying the problem, collecting, and modeling data; drawing statistical inferences from modeled data, and in linking statistical inferences to substantive scientific questions. There is no one order either—it’s more of a cyclical arrangement.

Does Gelman agree? I am guessing he would, with the possible exception of the role of a Bayesian prior, if any, in his analysis, for purposes of injecting background information.  But I’m too unclear on this to speculate.

To this same, large extent, Gelman’s view on the proper entry of background knowledge is in sync with Sir David Cox’s position; although for a minute I thought Gelman was disagreeing with Cox (about background information), this analysis suggests not.  Beyond what might be extracted from the snippet from the informal (Cox-Mayo) exchange, to which Gelman refers (p. 3), Cox has done at least as much as anyone else I can think of to show us how we might generate, systematize, and organize background information, and how to establish the criteria appropriate for evaluating such information.[i]

But maybe the concordance is not all as rosy as I am suggesting here. After all, in the same article, Gelman gives a convincing example of using background information which leads him to ask:

“Where did Fisher’s principle go wrong here? (p. 3)”

To be continued . . .in part 3.

[i]I give just one old and one new reference:

Cox, D. R., (1958), Planning of Experiments.  New York: John Wiley and Sons. (1992 Republished by Wiley Classics Library Edition.)

Cox, D. R., and C. A. Donnelly (2011), Principles of Applied Statistics, Cambridge: Cambridge University Press.

Categories: Background knowledge, Error Statistics, Philosophy of Statistics, Statistics | 12 Comments

Post navigation

12 thoughts on “Deconstructing Gelman part 2: Using prior Information

  1. Reader

    If a Bayesian holds that uncertainty is expressed with probabilities, then the likelihoods would also have to be conveyed with probabilities attached to them.

    • guest

      …as indeed they are, in Bayesian model averaging.

      And if we chop up the various alternative models finely enough, we get back to putting continuous priors on parameters.

      These are both just variations on a common theme; that a distribution (of possibly high dimension) is used to express what You know.

      • Reader

        to guest from reader-If what You know about the adequacy of your statistical model assumptions is represented in a prior distribution, as you wrote, then it would need to be added to the likelihoods in giving Gelman the information he asks for, yes? You could give a prior distribution for all the alternative models to the one you used to compute the likelihood (chopping up alternatives finely)?

        • Reader: I know what you’re getting at, but I don’t think this can be what he intends. According to Gelman reporting the likelihood is to report the “information from the data”. Of course the likelihood is based on a probability model: p(x;theta) for instance. It’s not just read off the data. But the additional probability you’re alluding to would be tantamount to p(p(x; theta)). I don’t think that’s what guest thinks Bayesians report; I’ve never seen people report the probability of the distribution or the probability of independence holding/not holding or of other probabilistic assumptions underlying the likelihood. That’s a different prior than the usual one, P(theta). Besides, how could p(x; theta) report “the information from the data” if the underlying probability model wasn’t accepted (as adequate for this purpose)? On the other hand, if the probabilistic assumptions are going to be checked by Gelman, he’d need the full data, not merely the likelihood.

        • guest

          Reader: it’s not clear to me whether you are using “likelihood” in its technical sense – as a function – or whether you use it as a synonym for “probability”, i.e. a single value. But yes, in Bayesian model averaging one has to give prior probabilities to the different submodels considered. (And this isn’t easy, particularly when the models contain nuisance parameters)

          Whether or not one uses model averaging, for almost all interestingly complex models there is no finite-dimensional sufficient statistic. Hence, to fully tell another user the likelihood (a function) you might as well just tell them the data, in which case they are free to use whatever model they like.

          If reporting the full data is too cumbersome in practice – and it may be – the (non-Bayesian, non-frequentist) data summary one instead reports should permit others to perform their analysis of choice, or a good approximation. In such situations I don’t see avoiding a bit of subjectivity.

          • Guest: But if you tell another user the data, don’t you need to report as well the probability model? Or would it suffice to report the manner of data generation, but that, if done adequately, would seen to amount to the same thing. I doubt Gelman would say to report the data and let users analyze it using any model they like.

            • guest

              Yes, reporting data values + method of data generation would be sufficient – *if* you can guarantee that the description in the second part is sufficiently detailed to inform analysis for any (plausible) target of inference. Note that in situations where the data was collected in line with a fairly precise protocol, this level of detail is not unrealistic; lots of studies publish “design” papers containing just this sort of information.

              Reporting data values + Your probability model forces users to “unpick” the data generation scheme from Your model – likely chosen for Your analysis – and then figure out the consequences for Their analysis, when this is possible. It’s not wrong but it can be harder to make it work in practice.

              Also note the distinction between data values + method of data generation (or model) can get blurry in e.g. adaptive designs, or when using hierarchical models – though this may not affect anything in practice.

          • Reader

            Likelihood was used by me in the technical sense, as in the post under discussion. The data are fixed, the parameters vary.

    • Jean

      D. Mayo’s e-mail is down and wants me to post the following comment (after we posted Reader’s comments which wouldn’t go through):

      Reader: thanks for your comment (sorry your comment was held up, I posted it for you, but now can’t get back in):
      Yes, it would seem to open up the problem of continual regress, probabilities of probabilities of probabilities…. But I do not see Bayesians even assigning prior probabilities to the assumptions of a statistical model intended to be used for some other statistical inference, as in appraising H in the example of this post. It is not clear that Gelman’s Bayesian, who wants everybody else to be non-Bayesian (at least for communicating to him), would be any more interested in your prior assignments to things like “this sample is Normal iid”, than he is in the “funny business” of ordinary priors to H and its alternatives. But I may be wrong.

      Regardless of the position on prior probabilities of statistical assumptions underlying the likelihoods, I would think he’d want more than mere likelihoods. From likelihoods, one doesn’t get enough information to assess error probabilities relevant for assessing well-testedness. And entirely outside of such considerations, I should think he’d want a slew of background details about the field, previous studies, flaws and foibles, background theories, variables etc. etc. —as I’ve already said in my post.=

      • Mayo

        Thanks Jean. This is why I am uncomfortable when people e-mail me, from time to time, asking me to post a comment for them, even when I’d really like to have their comment, and I understand sometimes people are unfamiliar with the procedure, or it’s misbehaving. I usually ask them to put it in directly, and often they do not. Else, if an administrator puts in the comment with the commentator’s name, he or she might forget to switch back and my comment gets posted using that name. Anyway, our internet is back now. By the way, the system asks for an e-mail but apparently it’s left on a setting that does not require an authentic e-mail (those free-wheeling Elba folk). Yet this could change. I’ve also noticed recently, for the first time, that there are lots and lots of comments that the system decides is spam, and I guess they are. More blog mysteries.

  2. Corey

    “Whether or not this leaves texts open to some charge of disingenuity, I leave entirely to one side.”


    • Corey: well, yes; what do you think?
      (By the way, your comments are still being flagged for approval,they should not be.)

I welcome constructive comments for 14-21 days. If you wish to have a comment of yours removed during that time, send me an e-mail. If readers have already replied to the comment, you may be asked to replace it to retain comprehension.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Blog at