Fortunately, we have Jim Berger interpreting himself this evening *(see December 11)*

*Jim Berger writes: *

A few comments:

1. Objective Bayesian priors are often improper (i.e., have infinite total mass), but this is not a problem when they are developed correctly. But not every improper prior is satisfactory. For instance, the constant prior is known to be unsatisfactory in many situations. The ‘solution’ pseudo-Bayesians often use is to choose a constant prior over a large but bounded set (a ‘weakly informative’ prior), saying it is now proper and so all is well. This is not true; if the constant prior on the whole parameter space is bad, so will be the constant prior over the bounded set. The problem is, in part, that some people confuse proper priors with subjective priors and, having learned that true subjective priors are fine, incorrectly presume that weakly informative proper priors are fine.

2. My more provocative comment was based on the fact that objective Bayesians worry a lot about the prior, and work hard to get a prior that is good in situations where one does not have much prior information or is obligated to use impartial priors (e.g., by regulation). True subjective Bayesians also worry a lot about the prior, attempting to model their prior information carefully, doing sensitivity studies, etc. But, in part because Bayesian analysis has become so popular and is being used by many without training in either objective Bayesian or true subjective Bayesian methods, there are many quite adhoc choices of priors being made that have no inherent justification and would, I claim, be much less ‘Bayesian’ than the objective Bayesian priors. All that is less exciting than my provocative comment, but it is what I had in mind when writing the comment.

————————————

UPDATE: Dec. 29, 2011: Andrew Gelman today fired back with a lengthy post on his own blog. I hope others will join in the discussion in the commentary here.

Jim’s remark (#1) actually illuminates a different twist to the remark I considered most cryptic. But why is it again that if the constant prior on the whole parameter space is bad, then so is the constant prior over the bounded set (despite being prior)?

The improper constant prior is bad when it leads to an improper posterior. If one tries to fix that by limiting the support of the prior in an arbitrary way such that the prior is proper, then the posterior moments and quantiles depend critically on the choice of the support of the prior.

Thanks to Prof Berger for the response. I am gratified to hear a leader in Bayesian stats lament the flippant treatment of priors by many newcomers to these approaches. I have the opinion that most of these folks have little knowledge of the philosophical perspective that underpins these stats. I make a point of discussing this with colleagues when possible. I worry that the rise of Bayesian stats is more akin to a fashion trend than a philosophical shift.

But do you concur that employing the system of priors Berger advocates enhances the knowledge of the philosophical perspective? or that they at least do a better job, given lack of such understanding? Or…?

Well, I must confess that I do not subscribe to any of the various Bayesian approaches. Several years ago I became very intrigued and excited about what biological anthropologists were trying to do with Bayesian statistics. However, I have not read a single paper in my field where the use of priors did not make me cringe. Further, as Bayesian models have become more popular in forensics, I have begun to appreciate the danger inherent in the flippant application of these models, which tend to yield posteriors that speak to central questions. You might recall the article I referenced awhile ago that reported on the rejection of a Bayesian analysis from a UK court.

So, I am uncomfortable with how priors are derived in practice, and even more disturbed by how we tend to ignore that issue after analyses are done and conclusions drawn. I worry about scientists relaxing their focus on the appropriateness of data as they employ more sophisticated models. I look askance at the fashion trend.

All that confessed, I appreciate Prof Berger`s concerns about loose thinking about priors. I also find it interesting to see how Bayesians might seek objectivity in their model building. What I most want to know, however, is how the Bayesian proponents will answer the critique of Birnbaum`s thesis. If the strong likelihood principle does not hold, what happens to the Bayesian house?

The Birnbaum proof shows that using two criteria that frequentists like conditionality and sufficiency implies that they should also follow the likelihood principle.

Frequentists that accept the proof find they need to reject one or both of sufficiency or condionality in order to reject the likelihood principle e.g. see

http://folk.uio.no/tores/Publi… who in fact reject both.Its interesting to read the discussion of the original Birnbaum article. The discussants both Bayesians and non-Bayesians are split on the importance of the proof. The importance of the argument is that those who except sufficiency and condionality as important would be restricted to using likelihoodist and Bayesian methods only. The proof is therefore important only if you think sufficiency and condionality are compelling.The likelihood principle is however not something that is usually regarded as part of the foundation of Bayesian statistics. Bayesian statistics is usually founded upon decision theory as argued by Ramesy, de Finetti and Savage. Edwin Jaynes also popularised the arguments of Richard T Cox which are seen by some to give a foundational argument. I.J. Good, Lindley among many others also contributed a lot of informal arguments.So a rejection of the Birnbaum argument would allow frequentists to reject the likelihood principle and keep conditioning and sufficiency. It would mean there was one less argument for Bayesians to criticise frequentists, but it would not affect the foundations of Bayesian statistics at all.

I don’t know where you’ve been, but haven’t you seen my disproof of the alleged demonstration? See Dec 6, 7 posts and many other references to the LP—of which there are many—in this blog. I was hoping to thereby educate readers to challenge what has been accepted for 60 years!

yes, I was referring to your anti-Birnbaum argument.

Because there are fairly radical elements to the argument, I think it will take a long time for a consensus to emerge about its validity. Submitting the argument to a peer reviewed journal would be a useful first step.

Until then I will read with interest what you and others like Christian Robert and David Cox say about it and consider it a tentative proposition.

Cambridge University Press did peer review each article extensively; I’m afraid there is so much misunderstanding out there of the LP that you can not reliably bow to what has been said for 50+ years. You’ll have to think it through yourself.

I am travelling and do not have access to my books, but my impression is that many recent Bayesian stats texts present the likelihood principle as critical to justifying faith in a posterior probability. I find it hard see why I should rely upon the likelihood portion of the model if the likelihood principle does not hold, given its role in updating my prior belief. It seems that the only way to salvage any utility would be to appeal to arguments such as “works pretty well in the long run.” But then you get into all of the concerns that frequentists deal with (e.g. assumptions, sampling error, etc.). If the likelihood principle does not hold, then frequency approaches start to look quite attractive again, as they did to most scientists in the 20th century.

A good Bayesian stats text book shouldn’t use the likelihood principle to justify faith in the posterior probability. That is not a valid argument.

The likelihood principle does not in anyway claim that the likelihood portion of the model is reliable. As the likelihood function is common to both Bayesian and non-Bayesian approaches if the likelihood is unreliable this would be a problem in either paradigm.

The Birnbaum argument is in practice an anti-sampling theory argument rather than a pro-Bayesian argument.

I think one of the most appealing aspects of Bayesian decision theory is that it makes no reference to the long run and can in fact be applied (at least in theory) to arbitrary observation outcomes. The usual Bayesian set-up of parameters with distributions is in fact not fundamental but arises in the common case where we want to model exchangeable observations. Search for philosophical discussion of the de Finetti representation theorem for more detail.

It seems the problem is that without the likelihood principle, and no means of dealing with sampling error, and no explicit concern for how the data were generated, then I should have no reason to think the likelihood component of a Bayesian model is informative. And that is supposed to be the interesting part.

If you read Savage and Cox, you’ll find out why the (subjective and objective, respectively) Bayesian approach only allows data to influence the conclusions through the likelihood. The premises which lead to that theorem are not the same as the premises of Birnbaum’s LP argument. I don’t have a good link on hand for Savage, but this is good intro to Cox’s theorem.