*Oh, she ta**kes care of herself, she can wait if she wants,
She’s ahead of her time.
Oh, and s*

*he never gives out and she never gives in,*

She just changes her mind.

She just changes her mind.

*(Billy Joel, “She’s Always a Woman”)*

If we agree that we have degrees of belief in any and all propositions, then, it is often argued (by Bayesians), that if your beliefs do not conform to the probability calculus, you are being incoherent, and will lose money for sure (by a clever enough bookie). We can accept the claim that, were we required to take bets on our degrees of belief, then given that we prefer not to lose, we would not accept bets that ensured our losing. But this is a tautology, as others have pointed out, and entails nothing about degree of belief assignments. “That an agent ought not to accept a set of wagers according to which she loses come what may, if she would prefer not to lose, is a matter of deductive logic and not a property of beliefs” (Bacchus, Kyburg, and Thalos 1990: 476).[i] Nor need coerced (or imaginary) betting rates actually measure an agent’s degrees of belief in the truth of scientific hypothesis..

Nowadays, surprisingly, most Bayesian philosophers seem to dismiss as irrelevant the variety of threats of being Dutch-booked. Confronted with counterexamples in which violating Bayes’s rule seems perfectly rational on intuitive grounds, Bayesians contort themselves into a great many knots in order to retain the underlying Bayesian philosophy while sacrificing updating rules, long held to be the very essence of Bayesian reasoning. To face contemporary positions squarely calls for rather imaginative deconstructions. I invite your deconstructions (to error@vt.edu) by April 23 (see So You Want to Do a Philosophical Analysis). Says Howson:

“It is the entirely rational claim that I may be induced to act irrationally that the dynamic Dutch book argument, absurdly, would condemn as incoherent”. (Howson 1997: 287)[ii] [iii]

It used to be that frequentists and others who sounded the alarm about temporal incoherency were declared irrational. Now, it is the traditional insistence on updating by Bayes’s rule that was irrational all along.

“There is nothing inconsistent in planning to entertain a degree of belief that is inconsistent with what I now hold, I am merely changing my mind” (Howson 1997: 287).

But one thought that the point of the inductive rule was to show how one ought to change one’s mind rationally. This, apparently, it does not do. The Bayesian never gives in, he just changes his mind.

**The “Motley Jumble”**

A main reason Howson and Urbach (2006: 83) dismiss what they call a “motley jumble of ‘justifications’[for Bayes’s rule] in the literature” is this:

While an agent may assign probability 1 to event S at time t, i.e., P(S) = 1, he also may believe that at some time in the future, say, t’, he may assign a low probability, say, .1, to S, i.e., P’(S) = .1, where P’ is the agent’s belief function at later time t’.

Let E be the assertion: P’(S) = .1.

So at time t, P(E) > 0

But P(S|E) = 1, since P(S) = 1.

Now, Bayesian updating says

If P(E) > 0, then P’(. ) = P(. |E).

But at t’ we have P’(S) = .1,

which contradicts P’(S) = P(S| P’(S) = .1) = 1, by updating.

It is assumed, by the way, that learning E does not change any of the other degree of belief assignments held at t (never mind how one knows this).

The examples that are at the heart of this variation on the counterexamples are found in William Talbott (1991), and sketched by many others. In one of them, S is:

S: Mayo eats spaghetti at 6 p.m. on April 15, 2012.

P(S) = 1,

where P is now my degree of belief in S (time t), and E is:

E: P’(S) = r, where r is the proportion of times I eat spaghetti (in some appropriate period), say, r = .1.

As certain as I am of spaghetti today, April 15, 2012, I also believe, rationally, that by this time next year I will have forgotten about it, and to obtain P’, I will (rationally) turn to the relative frequency with which I eat spaghetti[iv]. Or so the example goes. Variations on the counterexample involve current beliefs about impairment by alcohol or drugs.

One might wonder how examples like this could cause an account to flounder at the fundamental level. I’m not claiming to be up on the latest twists and turns on this saga. Ironically, though, the error statistician has no trouble accommodating the two probabilities of events causing the trouble here. But we are considering the Bayesians, and in particular Bayesian philosophers. They generally want to be able to assign probabilities to any propositions (in a language), not just events within statistical models.

One way some Bayesian philosophers explain the problem is that there is both relative frequency information such as P’(S) = .1 and also, since this is known, P’(P’(S) = .1) = 1. Bayesian epistemologists, to my knowledge, grant the counterexamples, but do not give up on the project, only on Bayesian updating. Bayes’s rule holds, they seem to allow, just when it holds. (Of course it is not just philosophers that have thrown over “betting coherence”; default Bayesian statistician Jim Berger says that it is certainly too strong (2006)). What is the current state of play here?

Howson (1997) endorses “the possibly surprising thesis that the Bayesian theory has no such rules….This does not immediately cast doubt, or more than there was before, on the validity of such classical results as the convergence of opinion theorems, since these are framed in terms of your prior probability that your prior conditional probability will exhibit suitable convergence.” (Howson 1997, 288-9)

Is he saying that those results were always a matter of your believing in your beliefs converging (using Bayes), and you’re still free to believe this? What am I missing?

**“Bayes’s Rule Is ‘Completely Arbitrary,’” Say (some) Bayesians**

To say there is a large amount of work by Bayesian philosophers on so-called Dutch-book arguments is a vast understatement. Howson and Urbach (2006: 83) express frustration that the field never tires of generating slight variations on the same counterexample:

“Invalid rules breed counterexamples. What is surprising in the case of conditionalization is that nobody seems to have realized why it . . . is anything other than a completely arbitrary rule when expressed unconditionally.”

Is it true that “invalid rules breed counterexamples”? Usually, a rule is found to be invalid on the basis of one counterexample. That’s the definition of an invalid rule. Then we move on and do not have to keep proving that it’s invalid. Howson is surprised by the volume and variety of counterexamples that continue to crowd the philosophical literature, along with new statements of conditions under which Bayes’s rule, and spin-off rules, hold. But I do not think he should be surprised. By the time students work through these conundrums as graduate students, the game can have a (fascinating[v]) life of its own. Overthrowing the research paradigm is not on. “*And she never gives in*, / *She just changes her mind.”*

**References**

- Bacchus, F., H. E. Kyburg, and M. Thalos (1990). “Against Conditionalization,”
*Synthese***85**:475-506. - Berger, J. (2006). “The Case for Objective Bayesian Analysis (with discussion),”
*Bayesian Analysis***1**, 385-402. - Howson, C. (1997). “A Logic of Induction,”
*Philosophy of Science***64**(2):268-290. - Howson, C. and Urbach, P. (2006).
*Scientific Reasoning: The Bayesian Approach*3^{rd}ed. La Salle Illinois: Open Court. - Mayo D. G. (1996).
*Error and the Growth of Experimental Knowledge*. Chicago: Chicago University Press. - Mayo D. G. (1997). “Duhem’s Problem, The Bayesian Way, and Error Statistics, or ‘What’s Belief got To Do With It?’” and “Response to Howson and Laudan”
*Philosophy of Science***64**(2):222-244 and 323-333. - Laudan, L. (1997). “How About Bust: Factoring Explanatory Back Into Theory Explanation”
*Philosophy of Science***64**(2): 306-316 - Talbott, W. (1991). “Two Principles of Bayesian Epistemology”,
*Philosophical Studies***62**: 135-150.

[i] On these grounds *EGEK* (Mayo 1996) dismissed the Dutch Book literature as irrelevant for scientific inference. But maybe the newest disavowals are of interest.

[ii] This occurred in a *Philosophy of Science* **64**(2) exchange shortly after *EGEK* appeared.

[iii] I am not certain about the work that’s performed by “being induced”, but I’ll leave this to one side.

[iv] These are Talbott’s numbers, I virtually never eat spaghetti.

[v] In somewhat the same sense that puzzle solving can be addictive, but here you can also get publications!

I’ve got a simpler example:

Assume P(S)=1.

Then P(S|not S)=1.

But we know P(S|not S)=0.

I can’t believe how silly Bayesians are. This example should be enough for them to admit how defunct their philosophy is. Of course they’ll never admit it.

But I take it they rule this out because P(not S) = 0.

Oh, my mistake. Change it to:

Assume P(S)=.99

Then P(S|not S) = .99

But we know P(S| not S)=0

They can’t say P(not S)=0 any more. Either way it makes Bayesians look bad. It’s amazing they based a whole school of inference and statistics off this.

I am sorry, but I don’t understand where “Then P(S|not S) = .99” comes from. Could you please explain in a little more detail?

Those were not my numbers, I’m sure you know, but the previous commentator.

MSchmidt,

I was just copying the following line from the post using “.99” instead of “1”:

“But P(S|E)=1 since P(S)=1”.

I showed this post to a Bayesian friend of mine. She did exactly what Mayo said she would; namely weaseled out of the contradiction with a bunch of new rules and caveats.

She said that “all probabilities are conditional on something” and that P(S) should be P(S|I) for some information I. Then she claimed it’s ok if

P(S|I)

P(S|I and E)

P(S|E)

all have different numerical values since they are conditional on different information.

Finally, she claimed it’s impossible for P(S|I)=1 for every conceivable I. I don’t know where she got that from, but these Bayesians should get their story straight before lecturing the rest of on probability.

Fisher in reply to Schmidt: I cannot say what your Bayesian friend is up to, but do say where your numbers come from, or retain my example. The case of S being imagined to be given a probability of 1 is special for the subjective Bayesian, and it’s knowing S at time t, and knowing the statistical information at time t’ that leads to the kind of counterexample to conditioning that is mentioned in my post. Obviously there are other issues one could raise.

Now I’m confused. Where does “P(S)=1 implies P(S|E)=1” come from?

In general from Bayes theorem, P(S)=1 implies

P(S|E)=P(E|S)/P(E)

but P(E|S) can be arbitrary even if P(S)=1

What you actually seem to be saying is imagine at time T the Bayesian has a state of Knowledge “K” which implies S. Everything is initially conditional on this K. Then we get

P(S|K)=1 implies P(S|K, E)=1, if P(E|K)>0

It’s natural to suppress K in the notation, but since it’s absence can cause confusion we should put it in. But once you specifying what every probability distribution is conditional on, then the contradiction disappears.

P'(S)=.1 is actually P'(S|J)=.1 for some J

and

P'(S)=1 is actually P'(S|K,E)=1

it’s not possible to force J to be logically equivalent to K,E without thereby making P'(S|J)=1. And if J isn’t logically equivalent to K,E then there is no paradox when the two probabilities are different.

I just though of another one:

suppose P(S|Z)=1

Then later we find out that (not Z) is true. Then by Bayesian updating we would compute the probability of S based on this new information. On the other hand we know from above that:

P(S|Z and not Z)=1 since P(S|Z) is always equal to 1.

But how can you condition on both “Z” and “not Z” being true? This clearly shows that Bayesian updating is inherently contradictory.

Truly, it’s stunning that Bayesian philosophy was ever taken seriously. Thank you Mayo!

Thanks Fisher, appreciation seems altogether rare in this blog business (so far).

Let me take this opportunity to express my appreciation for this blog. I know your time is precious, so I’m grateful for the illustrations of your point of view and for the time you spend writing post and replying to comments!

Aw, Corey…Do ya really mean it? Thanks.

I have several basic problems with all this bayesian talk (as opposed to Bayes’ theorem). First, the concept of “belief”. I haven’t seen a single, usable definition, so what are we talking about, apart from some intuitive notion that will vary between people? In any event, whatever its definition, “belief” has to refer to something about a person’s mental state at some time. Mental states involve mostly unconscious information, including that about past emotional states. This is quite a quagmire to try to get into! It’s not something simple that some algebraic formulae can compute reliably.

Second, if probability is not to be based on frequentist principles – counting of actual events – then how can we ever verify some individual’s ‘beliefs’ by actual test? Anybody may believe anything, but what is actually testable?

I would claim that since Bayes’ theorem is mathematically correct – in a frequentist-based probability system – that any properly based “Baysian” calculation will be equivalent to a proper frequentist one. It’s only when one brings in ‘beliefs’, subjective ‘priors’, and the like that the worms come in.

Tom: I think we should distinguish mere use of conditional probability– which entails “inverse probability” or bayes’s theorem–from Bayesian inference or “being Bayesian”. Everyone who uses the formal theory of probability uses conditional probability. That doesn’t make them a Bayesian, or if it does, some other term is needed if any of the interesting issues are to be discussed.

I am guessing that even subjective Bayesians are (at least) dualists about probability, and hold their beliefs to be “calibrated”? by relative frequencies. Anyway, I take the whole betting business to grow out of a desire to operationalize belief measurements (something like revealed beliefs, but others turn to revealed preferences).

Tom,

There are many varieties of Bayesian statistics and even underlying philosophy. There appears to me to be a continuum from the subjective personalistic to the objective Bayesians, differing notably on how they derive priors. The subjective Bayesians claim to be updating their own views about a topic and claim to achieve coherent beliefs. They cannot realistically say this proves anything to you beyond they have a coherent belief. The objective Bayesians seek to prove things to you and must struggle to convince you they have valid priors and a sufficient model, etc. Tough sell for them…

John Byrd: Of course we know about default vs subjective Bayesians, and also about the differences each group of practitioners has with its correlate among philosophers. My post was mostly direct to the work found in philosophy (or perhaps formal epistemology) across the different sects, I think. They seem to be giving up a lot, don’t they? I would be glad to hear what Bayesian philosophers think of Howson’s position, and the current state of play here.

Tom,

I’m plagued by the same doubts. A theorist starts out with some assumptions and then derives a theory from them.

How can anybody tell whether the theory is making correct predictions unless they test the assumptions. Thats the only way to do it. Their is no alternative.

But according to Bayesians, you just have to accept their word that they really do “believe” the assumptions. Maybe we’ll have to wait until they invent mind reading machines before we’ll ever know which theories are making correct predictions. (Ok that last comment was a little childish)

Just a quick thought:

In the dinner example you have information on April 15, 2012, that you do not have one year from then. Specifically your statement P(S) = 1 should say P(S|D,E) = 1 where D is the data that “Mayo ate spaghetti at 6 p.m. on April 6, 2012”. P(S|D,E) is a bit of a tautology (though to get something where P(X)=1 often times you need it to be a tautology) and E is redundant in this inference, but it is known.

One year from then, you would still have P(S|E); however, because you would have forgotten D, you know longer have that information for your inference. At that point your inference about S is limited to P(S|E) and by your definition of E, P(S|E)=r. This all seems perfectly coherent for this example.

This remains the case with your more abstract discussion. Your statement:

“But P(S|E) = 1, since P(S) = 1.”

seems like it should include the inclusion of the data and should thus state

“But P(S|D,E) = 1, since P(S|D) = 1.”

P(S|E) without the data would still just be r.

Mike: Remember this concerns dynamic Dutch books. The probability assessment, P, at time t, presumably takes your D into account. When it comes to P’ at the later time, the problem is that you can’t get there from Bayes updating, having previously assigned 1 to P(S). I want to be clear, this is not my example or problem (it is owned by Bayesian epistemologists, or most of them*). As I said, a frequentist would have no trouble with the probability assignments here, but Bayesian philosophers (subjective and objective) take these kinds of examples as counterexamples to Bayes’s rule (And if not this one, then another type.) Some suggest, understandably I guess, that it’s too much to expect people to record pieces of information about what I ate when, if this is to relate to actual agents. I find it strange, by the way, that they assign probability 1 to the statistical information (e.g., P’(S) = .9). But then, I find the whole idea of assigning beliefs to propositions in terms of probabilities disconnected to actual agents (except perhaps if its truth is equivalent to the occurrence of an appropriate event).

*I await any “U-Phils”.

Sorry for misattributing the example to you, I just had not heard it before your blog post as far as I can remember.

I’m going to follow a couple of the previous commenters and also thank you for your efforts with the blog. I may not comment often, but I enjoy reading it and appreciate the effort you put into it.

Back on topic though:

Given that S==”Mayo eats spaghetti at 6 p.m. on April 6, 2012″

I don’t think it makes sense to say P(S)=1 unconditional on any other information. S is just a statement, not any information. You can only really set P(S)=1 if S is a tautology (S:”This statement is true”) because then the statement essentially is it’s own data.

In the case of S from the example, when “P(S)”=1 it is conditional on the data that the agent has – thus either P(S|D,I) or just P(S|I) if the agent considers D to be part of their prior information. At last time t’, “P'(S)” still depends on the agents prior information at time t’. If P'(S)==P(S|D,I) then P'(S)=1.0, but if the agent has forgotten D, then P(S|D,I) should be updated by marginalizing over D, since D is no longer known. Thus:

P'(S) == P(S|D,I)*P(D|I)+P(S|{not D},I)*P({not D}|I)

P(S|{not D},I)=0 given the definitions of S and D

thus

P'(S) = P(S|D,I)*P(D|I)

and because P(S|D,I) = 1

P'(S) = P(D|I) = P(S|I)

given the definitions of D and S

This allows for the agent to update their Bayesian belief back from a position of certainty, “P(S)” = 1, to a position of uncertainty “P(S)”=r, through marginalization.

Thanks also for putting up with my curiosity and talking these thoughts out.

Mike: Check some of my references when you have time, if you’re interested. I will likely hold off on expecting U-Phils for this.