I am posting Jon Williamson’s* (Philosophy, Kent) U-Phil from 4-15-12

**In this paper http://www.springerlink.com/content/q175036678w17478 (Synthese 178:67–85) I identify four ways in which Bayesian conditionalisation can fail. Of course not all Bayesians advocate conditionalisation as a universal rule, and I argue that objective Bayesianism as based on the maximum entropy principle should be preferred to subjective Bayesianism as based on conditionalisation, where the two disagree.**

**Conditionalisation is just one possible way of updating probabilities and I think it’s interesting to see how different formal approaches compare.**

**Williamson participated in our June 2010 “Phil-Stat Meets Phil Sci” conference at the LSE, and we jointly ran a conference at Kent in June 2009.*

Thanks much Jon. What are these other updating methods? I know that Howson concurs that these counterexamples show there are no general Bayesian updating rules, which strikes me as a large concession. But the thing is, he would be free to use your rules and end up where you do (if he endorsed the maxent priors that you do.

Hi Deborah – The main updating rules are Bayesian conditionalisation, Jeffrey conditionalisation, minimum cross-entropy updating, maximum entropy updating, imaging, and various updating rules generated by qualitative belief revision frameworks.

One important consideration is that diachronic Dutch books purport to show that any formal rule of updating that disagrees with conditionalisation yields a diachronic sure loss. This leads most Bayesians into a trilemma: (i) accept conditionalisation, (ii) avoid any formal, universal updating rule, (iii) cast doubt on the importance of diachronic sure loss.

I choose option (iii) in the above paper, but I’m in the minority here – (i) and (ii) seem much more popular to me.

Testing, testing, 1-2-3, testing…

{

e1,…,eN}≥ ∑Pr(

ei)Pr(a|ei)> Pr(

a)Check, check. Check 1, check 2…

Corey: Mike working?

One thing that strikes me about the maxent principle is how far it will get you from what was in Bayes’ original treatise (in which Bayesian conditionalisation played a key role), so I wonder whether “objective Bayes” is a good name for this.

I guess good old Bayes would have a quite hard time figuring out what this has to do with what he wrote about.

I’m wondering how Williamson deals with the fact that there is “no uninformative prior”, (that what is uninformative for one transformation is informative for another) thereby leading O-Bayesians in statistics to various “default” choices of priors.

I had an email exchange about one part of this paper with Prof. Williamson in 2010; I had to leave it hanging to finish my dissertation. I’ve lost the emails, so here I’ll start again from zero. But before I do that, I want to be clear that I approach Bayesian inference from a Jaynesian perspective, which views probability theory as an extension of logic, conditionalization as the only update rule that satisfies the Cox

assumptions, and inference conditional on a contradiction as futile because probability theory inherits the principle of explosion from logic.

My argument concerns the reductio of diachronic Dutch books found on page 14 of the pre-submission version of the article. It goes like this: first, a theorem of probability theory. Note that this theorem is about mathematical probability, and has nothing to do with agents or updating. (Please pretend any character after an “

e” is a subscript.)Theorem: Let the Cartesian product of {

a, not-a} and {e1,…,eN} be a mutually exclusive and exhaustive set of events, each of positive unconditional probability. Suppose that for allifrom 1 toN, Pr(a|ei) ≥ Pr(a). Then for allifrom 1 toN, Pr(a|ei) = Pr(a).Proof: Suppose that for all

ifrom 1 toN, Pr(a|ei) ≥ Pr(a) with strict inequality for at least one value ofi. Then any convex combination ∑wiPr(a|ei) is strictly greater than Pr(a). But by the sum rule of probability, Pr(a) is such a convex combination withwi= Pr(ei). So Pr(a) = ∑Pr(ei)Pr(a|ei)> Pr(

a), a contradiction. Hence for allifrom 1 toN, Pr(a|ei) = Pr(a), QED.The reductio shows that in certain situations it is possible to Dutch book any agent that changes its degrees of belief at all, even one that changes it using conditionalization as the update rule. The situation in question is that “it is generally known that [an agent] will be presented with evidence that does not count against

a, so that [the agent’s] degree of belief inawill not decrease.” From it, Williamson infers that avoidance of Dutch books is a lousy criterion for choosing an update rule – for him, the diachronic Dutch book argument fails because it proves too much. But when creating the reductio, he doesn’t appear to havenoticed that the state of knowledge with which he endows the agent implies probabilistic independence of

aand the possible evidence, as shown in the theorem above. So the reductio amounts to “an agent can be Dutch booked if it changes its degree of belief in a proposition when presented with evidence it considersa prioriprobabilistically independent of the proposition.” Phrased this way, I can’t see the reductio shows that the diachronic Dutch book argument proves too much. (It’s also becomes obvious that an agent that updates using conditionalization is immune to diachronic Dutch book.)This comment is already very long, so I won’t address Williamson’s example of a situation in which the reductio applies unless someone wants me to.

What the heck — here’s the continuation in which I discuss Williamson’s example.

Williamson gives as an example of this situation a juror who will hear the prosecution’s case and knows that the prosecution is competent enough not to present evidence that will count against its case. The danger here is in thinking that the question of whether the set of facts that comprise the prosecution’s evidence counts against its case can be assessed without knowing the state of information of the agent doing the assessing. Consider two individuals, one of whom knows nothing of the crime and the other of whom is a juror and hence knows that the prosecution has chosen to take the case to trial. These people won’t have the same probability of the accused’s guilt, since for the first person the accused is not differentiated from the entire population, whereas for the juror, the mere fact that someone has been arrested and the prosecution is pressing the case is probabilistic evidence of guilt (but not legal evidence, of course). Upon being presented with the body of evidence, the first agent may revise the accused’s probability of guilt upwards while the juror may revise it downward: the juror has expectations about the kinds of evidence she’ll see, and the prosecution’s evidence may fail to meet that standard. It appears to me that when Williamson constructed his example, he was thinking of the prosecution’s evidence from the point of view of a person who knows nothing of the crime, and not from the perspective of a juror.

Corey: Thanks for posting this. I’m going to start a “page” with items I expect to come back to, and this is one (I’ll add others when we have time to review this blog in the summer). So, readers: after pondering the two Dutch book posts (this being the second) later on when you might have time, feel free to send me a “U-Phil”. Likewise for other topics.

Corey: If you send me your longer analysis of the example, I can post it as a “U-Phil”. It will be easier to read. If it’s not ready, send it any time in the future: error@vt.edu. I plan to revisit the 6 main issues on this blog, from the statistician’s and the philosopher’s perspectives, over the next 4-5 months. During the summer, we will try to weed out all peripheral posts (sending them to pages, say) so that we can better see the handful of themes.