Can You Change Your Bayesian Prior? The one post whose comments (some of them) will appear in my new book

.

I blogged this exactly 2 years ago here, seeking insight for my new book (Mayo 2017). Over 100 (rather varied) interesting comments ensued. This is the first time I’m incorporating blog comments into published work. You might be interested to follow the nooks and crannies from back then, or add a new comment to this.

This is one of the questions high on the “To Do” list I’ve been keeping for this blog.  The question grew out of discussions of “updating and downdating” in relation to papers by Stephen Senn (2011) and Andrew Gelman (2011) in Rationality, Markets, and Morals.[i]

“As an exercise in mathematics [computing a posterior based on the client’s prior probabilities] is not superior to showing the client the data, eliciting a posterior distribution and then calculating the prior distribution; as an exercise in inference Bayesian updating does not appear to have greater claims than ‘downdating’.” (Senn, 2011, p. 59)

“If you could really express your uncertainty as a prior distribution, then you could just as well observe data and directly write your subjective posterior distribution, and there would be no need for statistical analysis at all.” (Gelman, 2011, p. 77)

But if uncertainty is not expressible as a prior, then a major lynchpin for Bayesian updating seems questionable. If you can go from the posterior to the prior, on the other hand, perhaps it can also lead you to come back and change it.

Is it legitimate to change one’s prior based on the data?

I don’t mean update it, but reject the one you had and replace it with another. My question may yield different answers depending on the particular Bayesian view. I am prepared to restrict the entire question of changing priors to Bayesian “probabilisms”, meaning the inference takes the form of updating priors to yield posteriors, or to report a comparative Bayes factor. Interpretations can vary. In many Bayesian accounts the prior probability distribution is a way of introducing prior beliefs into the analysis (as with subjective Bayesians) or, conversely, to avoid introducing prior beliefs (as with reference or conventional priors). Empirical Bayesians employ frequentist priors based on similar studies or well established theory. There are many other variants.

images

.

S. SENN: According to Senn, one test of whether an approach is Bayesian is that while “arrival of new data will, of course, require you to update your prior distribution to being a posterior distribution, no conceivable possible constellation of results can cause you to wish to change your prior distribution. If it does, you had the wrong prior distribution and this prior distribution would therefore have been wrong even for cases that did not leave you wishing to change it.” (Senn, 2011, 63)

“If you cannot go back to the drawing board, one seems stuck with priors one now regards as wrong; if one does change them, then what was the meaning of the prior as carrying prior information?” (Senn, 2011, p. 58)

I take it that Senn is referring to a Bayesian prior expressing belief. (He will correct me if I’m wrong.)[ii] Senn takes the upshot to be that priors cannot be changed based on data. Is there a principled ground for blocking such moves?

I.J. GOOD: The traditional idea was that one would have thought very hard about one’s prior before proceeding—that’s what Jack Good always said. Good advocated his device of “imaginary results” whereby one would envisage all possible results in advance (1971,  p. 431) and choose a prior that you can live with whatever happens. This could take a long time! Given how difficult this would be, in practice, Good allowed

“that it is possible after all to change a prior in the light of actual experimental results” [but] rationality of type II has to be used.” (Good 1971, p. 431)

Maybe this is an example of what Senn calls requiring the informal to come to the rescue of the formal? Good was commenting on D. J. Bartholomew [iii] in the same wonderful volume (edited by Godambe and Sprott).

D. LINDLEY: According to subjective Bayesian Dennis Lindley:

“[I]f a prior leads to an unacceptable posterior then I modify it to cohere with properties that seem desirable in the inference.”(Lindley 1971, p. 436)

This would seem to open the door to all kinds of verification biases, wouldn’t it? This is the same Lindley who famously declared:

“I am often asked if the method gives the right answer: or, more particularly, how do you know if you have got the right prior. My reply is that I don’t know what is meant by “right” in this context. The Bayesian theory is about coherence, not about right or wrong.” (1976, p. 359)

H. KYBURG:  Philosopher Henry Kyburg (who wrote a book on subjective probability, but was or became a frequentist) gives what I took to be the standard line (for subjective Bayesians at least):

There is no way I can be in error in my prior distribution for μ ––unless I make a logical error–… . It is that very fact that makes this prior distribution perniciously subjective. It represents an assumption that has consequences, but cannot be corrected by criticism or further evidence.” (Kyburg 1993, p. 147)

It can be updated of course via Bayes rule.

D.R. COX: While recognizing the serious problem of “temporal incoherence”, (a violation of diachronic Bayes updating), David Cox writes:

“On the other hand [temporal coherency] is not inevitable and there is nothing intrinsically inconsistent in changing prior assessments” in the light of data; however, the danger is that “even initially very surprising effects can post hoc be made to seem plausible.” (Cox 2006, p. 78)

An analogous worry would arise, Cox notes, if frequentists permit data dependent selections of hypotheses (significance seeking, cherry picking, etc). However, frequentists (if they are not to be guilty of cheating) would need to take into account any adjustments to the overall error probabilities of the test. But the Bayesian is not in the business of computing error probabilities associated with a method for reaching posteriors. At least not traditionally. Would Bayesians even be required to report such shifts of priors? (A principle is needed.)

What if the proposed adjustment of prior is based on the data and resulting likelihoods, rather than an impetus to ensure one’s favorite hypothesis gets a desirable posterior? After all, Jim Berger says that prior elicitation typically takes place after “the expert has already seen the data” (2006, p. 392). Do they instruct them to try not to take the data into account? Anyway, if the prior is determined post-data, then one wonders how it can be seen to reflect information distinct from the data under analysis. All the work to obtain posteriors would have been accomplished by the likelihoods. There’s also the issue of using data twice.

So what do you think is the answer? Does it differ for subjective vs conventional vs other stripes of Bayesian?

[i]Both were contributions to the RMM (2011) volume: Special Topic: Statistical Science and Philosophy of Science: Where Do (Should) They Meet in 2011 and Beyond? (edited by D. Mayo, A. Spanos, and K. Staley). The volume  was an outgrowth of a 2010 conference that Spanos and I (and others) ran in London (LSE), and conversations that emerged soon after. See full list of participants, talks and sponsors here.

[ii] Senn and I had a published exchange on his paper that was based on my “deconstruction” of him on this blog, followed by his response! The published comments are here (Mayo) and here (Senn).

[iii] At first I thought Good was commenting on Lindley. Bartholomew came up in this blog in discussing when Bayesians and frequentists can agree on numbers.

WEEKEND READING

Gelman, A. 2011. “Induction and Deduction in Bayesian Data Analysis.
Senn, S. 2011. “You May Believe You Are a Bayesian But You Are Probably Wrong.
Berger, J. O.  2006. “The Case for Objective Bayesian Analysis.”

Discussions and Responses on Senn and Gelman can be found searching this blog:

Commentary on Berger & Goldstein: Christen, Draper, Fienberg, Kadane, Kass, Wasserman,
Rejoinders: Berger, Goldstein,

REFERENCES

Berger, J. O.  2006. “The Case for Objective Bayesian Analysis.” Bayesian Analysis 1 (3): 385–402.

Cox, D. R. 2006. Principles of Statistical Inference. Cambridge, UK: Cambridge University Press.

Mayo, D. G. 2018. Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars. Cambridge.

Categories: Bayesian priors, Bayesian/frequentist | 16 Comments

Post navigation

16 thoughts on “Can You Change Your Bayesian Prior? The one post whose comments (some of them) will appear in my new book

  1. Tom Passin

    “I take it that Senn is referring to a Bayesian prior expressing belief. (He will correct me if I’m wrong.)[ii] Senn takes the upshot to be that priors cannot be changed based on data. Is there a principled ground for blocking such moves?”

    Andrew Gelman has a post just today on exactly this:

    http://andrewgelman.com/2017/06/18/dont-say-improper-prior-say-non-generative-model/

    I’m not sure it actually answers this particular question, but it’s interesting context for thinking about it.

  2. I think that changing priors is one of the things that the logic of abductive inference is about. More generally, it’s the order of reasoning involved in changing reference classes of distributions, logical frames of reference, paradigms, etc.

  3. Great post. I’ve always seen “the prior” as one’s best guess about the truth of a proposition, before receiving new evidence about the truth or falsity of that proposition. Once one receives and evaluates new evidence, you can’t go back and change your prior; instead, you have to update the prior

  4. Huw Llewelyn (@HL327)

    It is sometimes advocated that a prior probability distribution can be based on some prior study data. In this case, the prior probability distribution based on past data could be regarded as a posterior probability distribution based on the past data’s likelihood distribution and a uniform ‘base rate’ prior probability distribution. Under these circumstances, the likelihood distribution based on the new data results in a ‘second generation’ posterior probability. This is consistent with the view that you cannot change a prior if it is based on past data, because this would mean changing the data!

    I would go further by suggesting that random sampling is a special case where all the prior probabilities of the parameters and statistics are uniform anyway (see https://blog.oup.com/2017/06/suspected-fake-results-in-science/). In practice, it doesn’t matter whether one accepts this or not. The effect of a prior probability distribution on the posterior probability distribution will be the same whether it is a ‘base-rate prior’ or not. However, I find it conceptually easier to understand by linking a prior probability distribution to real data or even pseudo-data. Again it means that you should not change the prior as it would imply changing the ‘data’ on which it is ‘conditional’.

    • Huw: Thanks for this, it’s similar, in part, to remarks by Senn quoted in the 2015 post. I’ll be interested to check the link.

  5. “Is it legitimate to change one’s prior based on the data?”
    “So what do you think is the answer? Does it differ for subjective vs conventional vs other stripes of Bayesian?”

    Sometimes. A subjective Bayesian encountering completely unexpected data changes the prior:
    http://wp.me/p5eoC-dF

    In the philosophy literature, that has been compared to changing the premises of a deductive argument. It has been argued that just as one may revise a premise without abandoning deductive logic as a tool, one may revise a prior without abandoning Bayesian updating as a tool.

    • DRB: I was trying to ascertain what Bayesians claim, for one or another Bayesian account. Even granting your analogy, the problem is akin to circular arguments, which, after all are perfectly valid. No one thinks you give up on deductive logic when you employ circular arguments, the problem is that you haven’t given reasons for detaching the conclusion.
      The standard idea is that priors represent beliefs or information pre-data, and if you’re allowed data dependent changes, what’s to stop you from arranging it so that you get your desired posterior? A great many proposals are discussed in detail in the comments to the original blog.

      • Huw Llewelyn (@HL327)

        The problem is that there are many different ‘prior probabilities’ of different outcomes that are conditional on different types of evidence that are relevant to interpreting the result of a scientific study. In some cases the prior probability regards the outcomes of random sampling process (when each outcome will have an equal prior probability (https://blog.oup.com/2017/06/suspected-fake-results-in-science/). In other situations (e.g. the prior probabilities of various diagnostic criteria being observed in future) this will not be the case. If a random sampling model is to be applied to the mean or proportion observed during a new scientific study, then its methods will have to be described very accurately and in a way that makes the ‘prior probability’ low of factors (e.g. cherry picking etc.) that would invalidate the use of a sampling model for the analysis.

        Another kind of prior evidence would be the data of an identical study that could be combined with the result of the new study (e.g. with meta-analysis or combining the likelihood ratio distributions of both studies in a Bayesian fashion). Other evidence would also have to be taken into account when making inferences such as the use of methods to make bias improbable (e.g. double blind randomization). Another type of prior evidence would be the results of other studies that show that some hypothetical mechanisms are improbable (e.g. normal AIDS blood tests in a village population with a high prevalence of tuberculosis, which would make AIDS improbable as the underlying cause, this normal finding being compatible with other causes however).

        This suggests that a single prior probability based on Bayes rule does not provide an adequate model for interpreting the results of scientific studies and some other probability model may have to be used (e.g. https://blog.oup.com/2013/09/medical-diagnosis-reasoning-probable-elimination/).

      • “The standard idea [for subjective Bayesianism] is that priors represent beliefs . . . pre-data, and if you’re allowed data dependent changes, what’s to stop you from arranging it so that you get your desired posterior?”

        The analogous idea for deduction is that the premises represent accepted propositions before accepting additional premises. (Acceptance here can be for the sake of argument.) If a premise is later accepted that contradicts the others, a change in which propositions are accepted is required to avoid contradiction. What’s to stop you from arranging it so that you get your desired deductive conclusion?

        • DRB: As I think I noted in my earlier comment, a circular argument is deductively valid. The issue is its soundness. If you were supposed to provide reasons for the conclusion, then premises that already assume the conclusion fail at the job. If you’re allowed to argue circularly, anything can be proved, so there’s no error control.

  6. Pingback: “Can You Change Your Bayesian Prior?” | Complexity and Statistics Research

  7. Donald Williams

    Hi:
    I am not really sure what you mean by change the prior or a frequentist prior. For more difficult estimation problems, Bayesian methods can often produce the more optimal estimator and this depends on the prior. One such estimation problem is tau in meta analyses. Here, one can investigate frequencies in simulation with a paramer defined as a point estimate or as a probsbulty distribution. The former samples the same value, whereas the latter randomly draws from the hypothesized distribution and then computes some performance measure (e.g. MSE). To see what allows for flexibility and good performance, many priors can be investigated. So, sure, priors can be changed and examined for performance just like any other model assumption such as equal variances, the assumed likelihood, etc. The important thing to consider is that frequency calculations do not necessarily mean frequentist inferences. That is, we can examine our models in this way, but still understand these things hold true in the small world of Monte Carlo simulation and are used to better understand our models. This stands in contrast to frequentist inferences, although a Bayesian surely can be justified to make this inference but the opposite is not true.

    In sum, for people who fit reasonably complex models, the prior is not generally viewed as something that is updated but as another probabilty distribution to accompany the likelihood. Here, assuredly a prior can be changed because it does not reflect believe, but is used in an attempt to capitalize on the mathematical properties of the prior distribution in a way that leads to a better model (better in a way that can be verified with simulation or posterior predictive checks).

    Donny

  8. rkenett

    Wowww – this is a great thread. Two comments by associative thinking:
    1. The issue of rerandomization seems related. Cox writes about it in his book on the design of experiments, Fisher had no problems with it but apparently, never put it in writing.
    2. The comment of Cox that “the danger is that “even initially very surprising effects can post hoc be made to seem plausible” seems relalted to Taleb’s black swan effect https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1433490

I welcome constructive comments that are of relevance to the post and the discussion, and discourage detours into irrelevant topics, however interesting, or unconstructive declarations that "you (or they) are just all wrong". If you want to correct or remove a comment, send me an e-mail. If readers have already replied to the comment, you may be asked to replace it to retain comprehension.

Blog at WordPress.com.