Bayesian philosophers (among others) have analogous versions of the criticism in my April 28 blogpost: error probabilities (associated with inferences to hypotheses) may conflict with chosen posterior probabilities in hypotheses. Since it’s Saturday night let’s listen in to one of the comedy hours at the Bayesian retreat (note the sedate philosopher’s comedy club backdrop):

*D**id you hear the one about the frequentist error statistical tester who inferred a hypothesis H passed a stringent test (with data x)?*

*The problem was, the epistemic probability in H was so low that H couldn’t be believed! Instead we believe its denial H’! So, she will infer hypotheses that are simply unbelievable!
*

So clearly the error statistical testing account fails to serve in an account of knowledge or inference (i.e., an epistemic account). However severely I might wish to say that a hypothesis *H* has passed a test, the Bayesian critic assigns a sufficiently low prior probability to *H* so as to yield a low posterior probability in *H*[i]*. * But this is no argument about why this counts in favor of, rather than against, their Bayesian computation as an appropriate assessment of the warrant to be accorded to hypothesis *H*.

To begin with, in order to use techniques for assigning frequentist probabilities to events, their examples invariably involve “hypotheses” that consist of asserting that a sample possesses a characteristic, such as “having a disease” or “being college ready” or, for that matter, “being true.” This would not necessarily be problematic if it were not for the fact that their criticism requires shifting the probability to the particular sample selected—for example, a student Isaac is college-ready, or this null hypothesis (selected from a pool of nulls) is true. This was, recall, the fallacious probability assignment that we saw in Berger’s attempt, later (perhaps) disavowed. Also there are just two outcomes, say s and ~s, and no degrees of discrepancy from H.

* Isaac and college readiness*

An example that Peter Achinstein[ii] and I have debated concerns a student, Isaac, who has taken a battery of tests and achieved very high scores, *s, *something given to be highly improbable for those who are not college ready.[iii] We can write the hypothesis:

*H*(I): Isaac is college ready*.*

And let the denial be *H’*:

*H*’(I): Isaac is not college ready (i.e., he is deficient).

The probability for such good results, given a student is college ready, is extremely high:

P(s | *H*(I)) is practically 1,

while very low assuming he is not college ready. In one computation, the probability that Isaac would get such high test results, given that he is not college ready, is .05:

P(s | *H’*(I)) =.05.

But imagine, continues our critic, that Isaac was randomly selected from the population of students in, let us say, Fewready Town—where college readiness is extremely rare, say one out of one thousand. The critic infers that the prior probability of Isaac’s college-readiness is therefore .001:

(*) P(*H*(I)) = *.*001*.*

If so, then the posterior probability that Isaac is college ready, given his high test results, would be very low:

p(*H*(I)|*s*) is very low,

even though the posterior probability has increased from the prior in (*).

This is supposedly problematic for testers because we’d say this was evidence for H(I) (readiness). Actually I would want degrees of readiness to make my inference, but these are artificially excluded here.

But, even granting his numbers, the main fallacy here is fallacious probabilistic instantiation. Although the probability of a randomly selected student taken from high schoolers in Fewready Town is .001, it does not follow that Isaac, the one we happened to select, has a probability of .001 of being college ready (Mayo 1997, 2005, 117).

Achinstein* *(2010, 187) says he will grant the fallacy…but only for frequentists:

“My response to the probabilistic fallacy charge is to say that it would be true if the probabilities in question were construed as relative frequencies. However, … I am concerned with epistemic probability.”

He is prepared to grant the following instantiations:

- P% of the hypotheses in a given pool of hypotheses are true (or a character holds for p%).
- The particular hypothesis
*H*_{i}was randomly selected from this pool. *Therefore*, the objective epistemic probability P(*H*_{i}is true) = p.

Of course, epistemic probabilists are free to endorse this road to posteriors—this just being a matter of analytic definition. But the consequences speak loudly against the desirability of doing so.

*No Severity.* The example considers only two outcomes: reaching the high scores *s*, or reaching lower scores, ~*s*. Clearly a lower grade gives even less evidence of readiness; that is, P(*H*’(I)| ~*s*) *> *P(*H*’(*I*)|*s*). Therefore, whether Isaac scored as high as *s *or lower, ~s, the epistemic probabilist is justified in having high belief that Isaac is not ready. Even if he claims he is merely blocking evidence for Isaac’s readiness (and not saying he believes highly in his unreadiness), the analysis is open to problems: the probability of finding evidence of Isaac’s readiness even if in fact he is ready (*H(I)* is true) is low if not zero. Other Bayesians might interpret things differently, noting that since the posterior for readiness has increased, the test scores provide at least some evidence for *H*(I)—but then the invocation of the example to demonstrate a conflict between a frequentist and Bayesian assessment would seem to diminish or evaporate.

*Reverse Discrimination?* To push the problem further, suppose that the epistemic probabilist receives a report that Isaac was in fact selected randomly, not from Fewready Town, but from a population where college readiness is common, Fewdeficient Town. The same scores s now warrant the assignment of a strong objective epistemic belief in Isaac’s readiness (i.e., *H(I)*). A high-school student from Fewready Town would need to have scored quite a bit higher on these same tests than a student selected from Fewdeficient Town for his scores to be considered evidence of his readiness. (Reverse discrimination?) When we move from hypotheses like “Isaac is college ready” to scientific generalizations, the difficulties become even more serious.

We need not preclude that *H*(I) has a legitimate frequentist prior; the frequentist probability that Isaac is college ready might refer to generic and environmental factors that determine the chance of his deficiency—although I do not have a clue how one might compute it. The main thing is that this probability is not given by the probabilistic instantiation above.

These examples, repeatedly used in criticisms, invariably shift the meaning from one kind of experimental outcome—a randomly selected student has the property “college ready”—to another—a genetic and environmental “experiment” concerning Isaac in which the outcomes are ready or not ready.

This also points out the flaw in trying to glean reasons for epistemic belief with just any conception of “low frequency of error.” If we declared each student from Fewready to be “unready,” we would rarely be wrong, but in each case the “test” has failed to discriminate the particular student’s readiness from his unreadiness. Moreover, were we really interested in the probability of the event that a student randomly selected from a town is college ready, and had the requisite probability model (e.g., Bernouilli), then there would be nothing to stop the frequentist error statistician from inferring the conditional probability. However, there seems to be nothing “Bayesian” in this relative frequency calculation. Bayesians scarcely have a monopoly on the use of conditional probability! But even here it strikes me as a very odd way to talk about evidence.

**References:**

Achinstein, P. (2001), *The Book of Evidence*, Oxford: Oxford University Press.

— (2010), “Mill’s Sins or Mayo’s Errors?”, pp. 170-188 in D. G. Mayo and A. Spanos (eds.), *Error and Inference. Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science*, Chicago: Chicago University Press.

— (2011), “Achinstein Replies” pp. 258-98 in G. Morgan (ed.) *Philosophy of Science Matters: The Philosophy of Peter Achinstein*. Oxford: Oxford University Press.** **

Howson, C. (1997a), “A Logic of Induction”, *Philosophy of Science* 64, 268–90.

— (1997b), “Error Probabilities in Error,” *Philosophy of Science* 64(4),194.

Mayo, D. G (1997a), “Response to Howson and Laudan,” Philosophy of Science 64: 323-333.

Mayo, D. G. (1997b), “Error Statistics and Learning from Error: Making a Virtue of Necessity,” in L. Darden (ed.) *Supplemental Issue PSA 1996: Symposia Papers, Philosophy of Science *64, S195-S212.

— (2005), Evidence as Passing Severe Tests: Highly Probed vs. Highly Proved, pp. 95-127 in P. Achinstein (ed.) *Scientific Evidence*. Johns Hopkins University Press.

[i] e.g., Howson 1997a, b; Achinstein 2001, 2010, 2011.

[ii] Peter Achinstein, Professor of Philosophy at Johns Hopkins University. Among his many publications, he is the author of: The Concept of Evidence (1983); Particles and Waves: Historical Essays in the Philosophy of Science (1991) for which he received the prestigious Lakatos Prize in 1993; and The Book of Evidence (2003).

[iii] I think Peter and I have finally put this particular example to rest at a workshop I held here in April 2011, with grad students from my philosophy of science seminar. When a student inquired as to where we now stood on the example, toward the end of the workshop, my response was to declare, with relief, that Isaac had graduated from college (NYU)! Peter’s response dealt with the movie “Stand and Deliver!” (where I guess reverse discrimination was warranted for a time.)

Mayo, as Lehmann has put it : “If one firmly believes the hypothesis to be true, extremely convincing evidence will be required before one is willing to give up this belief, and the significance level will accordingly be set very low”.

So what if “P(s | H’(I)) =.05.” is decided as not (severe) enough, given that the student is from “Fewready”?

If someone had decided that (and it seems to me that this is permited in the error statistical approach, that is, to judge what the stringency level has to be), than the problem would still be there. It wouldn’t be a problem of the Bayesian approach per se.

It seems to me that the question here is not epistemic, but rather moral. It would not be fair to set up a higher bar for a student, just because of his background.

Thanks,

Carlos

Sorry, not sure I see what the problem would be? I might note that I’m opposed to setting a single cut-off for evidence, but would rather report how well probed a given hypothesis is, and how well/poorly discrepancies from it. So, this kind of example is rather ill-formed, but since that’s what they’re saying at the comedy club, I’m prepared to consider them. And of course,the whole idea of college readiness is so imprecise, as opposed to, say, degree to which high school material has been mastered, perhaps with a NY state regent’s exam. I am assuming one tries to flesh these out for purposes of entertaining the criticism. (The same type of example is sometimes made with hypotheses about the presence or absence of a disease.)

On the question of moral values here, that would allude to something distinct from merely evaluating what the evidence indicates about his readiness. However, one could say that adopting one or another inference account here, does have moral implications, if this construal of scores is to be the basis of a policy. And, indeed, it is the basis of policy! (When I was first confronted with this criticism, as it happened, I consulted Erich Lehmann whose wife was working at the Educational Testing Service in N.J., and he was telling me of the problem….More on this perhaps another time.) I happen to think it is wrong, logically, epistemologically, and morally, to assign Isaac a probability of readiness based on the urn from which he’s been randomly drawn. Frankly, I thought this example would make it SO obvious how the meaning of the hypothesis under consideration was shifting through the example, that a point I’d been trying to make for awhile would be seen as obvious. To my surprise, the Bayesian epistemologists, at least to my knowledge, just bit the bullet and embraced ir as perfectly rational, given that is all the knowledge we have. Sorry to go on..

Carlos: I meant to note that however small the probability is made one can still find a smaller prior for H(I) for their result to occur.

It’s true that .05 is rather inadequate, but the critics numbers wouldn’t working out as well otherwise, so I hiked it up—

How can the probability of Isaac’s real readiness be a function of the % “ready” in the urn of names he was drawn out from? Depending on the pool, his “readiness” would change every day!

It isn`t a function of the urn he was drawn from. This could only make sense in the absence of evidence with a desire to make a bet. This sort of reasoning should not be used for serious purpose. If the goal is to prove readiness, this use of a prior is unsatisfactory and seemingly unfair.

True, that was one my points. However, I still wouldn’t bet on his unreadiness given his test scores s (and the example really should have them being better than, say, 1%; this is one variant that doesn’t require the prior to be even smaller). We are to imagine they are rather impressive scores for a high school student.

Eileen:Well we are supposed to imagine that “all we know” is that Isaac was randomly drawn from the urn of students in Fewready, even though that is already deceptive. I am granting all the givens:i.e., one can set up an ordinary bernouilli trial, and the probability the slip from this pool contains a name of an unready student is .001. That of course refers to the generic event of “some student or other” having the property. In short, it is intended to be analogous to the urn of null hypotheses in my April 28 post.

Having said all that, you are absolutely right! His chance of readiness would change with the pool we imagined he came from.

Mayo: Is it fair to characterize (but not summarize!) your point of view as claiming that the so-called “base rate fallacy” is not a fallacy?

Corey: This has to be qualified. If the number given is not a proper frequentist prior for the hypothesis under test, then there’s no basis for multiplying them. If it is a correct frequentist prior, and the goal is the posterior, then we frequentists also use conditional probability. I’ve never seen one of the alleged counterexamples, usually used against significance tests, be legitimate, however. On a separate issue, there is the question of whether the subjects in various experiments are committing a fallacy of reasoning when they don’t use certain given base rates. Generally there is serious ambiguity because it is likely the subject is thinking of likelihood rather than probability. For example, the likelihood of a conjunction given x may easily exceed that of any conjunct, unlike probability. Often, that seems the proper construal of the example, and sometimes the questions even use the word ‘likelihood’ (which, incidentally, I can’t make a palindrome with, see palindrome page).

I think the wrongheadedness of the Bayesian epistemologists’ interpretation is obvious when the same score from Fewdeficient town is taken as evidence of readiness. No one has commented on this.

Eileen: Yes it’s as if being in a certain pool “rubs off” on the individual. It gets to this ambiguity between events and hypotheses (that assign probabilities to events). If we stretch things so as to imagine H(I), Isaac is ready, is a kind of event, it is a very different kind of event than getting a “success” in an experiment involving random selections from a city, p% of which have a character or not (readiness). If we try to imagine a frequentist probability assignment to the first event, never mind how we could get it, it is very different from the second. The irony is that the kind of probability that would make sense for the latter circumstance, the one the Bayesian epistemologist is using here for the former, makes sense only in considering repeated sampling from the population (whether from fewready or fewdeficient). I don’t deny there are circumstances where that relative frequency could be what’s wanted, but it differs from our circumstance with the hypothesis H(I).

I have for a long time had deep concerns about the selection of priors in real world applications. One requirement for the probability calculation here to be valid is that Isaac has to be a truly random selection–random with regards to what matters, which is his preparation leading up to the exam. Maybe he was home schooled, maybe his parents are the town`s only physicians and they augment his learning… I think people are sold on the Bayesian approach by these cute scenerios, and then take the approach that starts with the subjective model and shoehorns the facts into it. It gets worse from there because there are so few requirements for model checking and critical evaluation of the sampling method. This is what we are seeing in many disciplines, to include forensics (see earlier posts). What group of people are equally likely to have visited a crime scene during a small window of time? Usually we do not know. That is, we have no valid way to know. But it is awfully easy to make something up so we can complete the calculation.

It would be interesting to see one of those examples. Here, of course, it is given that all we know is that he’s been randomly selected from Fewready town.

But, as you say, we already have backgrounds as to how variable these things are. Remember this example is supposed to show what’s wrong with evaluating a hypothesis like H(I) using an error statistical assessment, and I say it does the opposite.

A question this raises for me is: Do subjective Bayesians acknowledge that it is not enough to use a random mechanism to select Isaac, but that he must have been produced by a random process (wrt his preparation), or it is not valid to use the Fewready frequencies as the basis for the prior. In other words, there is at least one assumption that would warrant the use of. 999 in the prior. What if this assumption is not met? (What if the students are really not analogous to balls drawn from an urn.)

Actually, at least some of the Bayesian epistemologists who give this kind of example regard themselves as “objective” and not subjective. The objective prior here is supposed to come from the random selection from the urn with p% possessing some property. I don’t think any Bayesian epistemologists are amongst our commentators unfortunately, although I have invited them directly.

Maybe Fisher can help us. If the prior in the example cannot be regarded as independent of the probability Isaac will pass the test– and surely it is not– then I will go further to say that one cannot use the multiplication rule as required by the Bayesian model.

John, the property that’s usually important is Bayesian arguments with multiple similar parameters is that of exchangeability, or partial exchangeability – if information about schooling, or Mom and Dad being doctors should be available. Either way, it is not a statement that each student is equally able.

Regarding “critical evaluation of the sampling method” and “requirements for model checking”… you don’t have to go far to see examples of Bayesians doing exactly these, or to see other statisticians using non-Bayesian methods yet completely ignoring them. And let’s not pretend that “making something up so we can complete the calculation” only ever happens in Bayesian analyses.

Determining whether methods are “good” or “bad” is not simply a matter of seeing where they lie in the Bayesian/frequentist spectrum.

Guest, thank you, and I agree and understand that all methods are subject to misuse. P-value misuse has been made infamous in recent years. However, the Bayesian model is more ambitious in its pretention to tell us the probability that Isaac is college ready, giving us a grand number that `tells all.` Further, this example has been posited to reveal the superior characteristics of a Bayesian approach in previous pubs. For the example, I cannot see that exchangeability saves the day. The students are not analogous to balls in an urn. The balls are assumed to be same size and shape and to be mixed prior to each drawing. The case of Isaac can easily be more like the balls are various sizes and the larger balls are more likely to be drawn. This followed by use of ball size to measure the likelihood of future success. Thus, no random process for the prior and a lack independence. (The Isaac example bears resemblance to the issues germaine to racial profiling. Also to the use of priors to determine random match probability in DNA identification. I contend the allure of the single grand number provided by the Bayesian approach encourages loose thinking, esp with priors.)

If the foundational attacks on Bayes reduce to “it’s easy to misuse” then these apply equally to frequentist analysis, and we should just give up on statistics.

To *actually* criticize Bayesian methods (or, more constructively, to probe their limitations) the examples one seeks are those where no Bayesian analysis can provide the same substantive conclusions as a good analysis justified under some other approach. Here, in addition to addressing the base rate fallacy about what a good analysis actually is, one would also have to consider Bayesian analyses beyond just posterior probabilities of the null.

Guest: The issue of this post, like the other “comedy club” posts was not to criticize Bayesians, but to put in the spot light some “knock down” criticisms of frequentist, “sampling” or error statistics that are published repeatedly as obviously correct, if not proving the unsoundness of the approach. We’re on the defensive here.

But if I were looking to compare, I wouldn’t consider it a severe test to see if there are examples “where no Bayesian analysis can provide the same substantive conclusions as a good analysis justified under some other approach” for the simple reason that it is not so difficult to reproduce the answer once you know where you want to go. Right?

The repeatedly-published criticisms – by Kadane, for example – are (clearly, I think) criticisms of *a* frequentist approach, an approach much like the practice of *some* researchers. If it’s not your preferred approach (again, fairly clearly it’s not) the only “defensive” needed is to point this out, clearly. Following this, all the rest of your arguments could be made constructively – and would be easier to read if they were.

I appreciate it may well not be intended as such, but often the discussion here reads like kneejerk anti-Bayes sentiment from the 60s and 70s. Thankfully, simple prejudice like this has died off in most of the statistical literature; we can (now) learn about the relative strengths and weaknesses of different statistical approaches without a slugfest. Even if it “spoils the fun”, as Adrian Smith put it.

Also, yes, it’s trivially easy to reproduce “the answer” (or a good approximation thereof) for a given dataset. But it’s often a major challenge to do this for general datasets, which is what I meant. A nice example is the Casella-Berger result on one-sided p-values, discussed here recently, that few people saw coming, and that sinks a big hole in the (silly, kneejerk, pro-Bayesian) argument that all p-values are without any merit.

Guest: I don’t think you get it. These howlers have not been removed from the texts or papers or qualified, it’s as if some texts keep the start of their book and update the later parts which an alert student will find more equivocal than those strong and hurtful early dogmatic claims. Even if the rest of the book portrays a much more nuanced approach and even, at times, conflicts with the opening “philosophical” wounds, the humiliating, knock down criticisms at the start is what the new student faces. And what does this tell the student about how reporting is done in the field? That is almost to abuse one’s position as textbook writer/teacher, and we see the results in that students repeat the same thing and are not encouraged to question those initial, extremely strong denouncements. I am familiar because I have just reviewed around 10 standard Bayesian texts*. One noteworthy exception, I’m sure it’s not the only one, is a text by Gelman, Carlin, Stern and Rubin. Honorable mention to them! If I was able to put a stop to even one text book repeating the declarations of Bayesian philosophical superiority followed by the identical howlers, I would consider my efforts successful.

*These are prominent, contemporary “middle-of-the road” Bayesian texts,by the way.

@Mayo, mis-statements are not going to get removed from published texts, or papers.

The “howlers” you mention compare good applications of the authors’ preferred method to bad applications of its competitors. If, as a textbook writer, one believes one’s audience is currently doing and always will do a shabby job with competitor methods, and that one’s textbook will teach them how to always do a good job with one’s preferred methods, this comparison is exactly the right one to make.

If instead, more modestly, the textbook author aims to educate/inform the audience about when different methods may be more or less useful – or when there’s not much in it – then such a loaded comparison is not needed. (Vic Barnett’s book aims to educate/inform like this, and I think succeeds.)

This all holds whether one prefers methods that are Bayesian, frequentist, or machine-learning or fuzzy sets or whatever. In fact loaded comparisons are endemic, in all branches of statistical research. For example, in seminars it’s common to joke before showing simulations that “here are some simulations that show how superior [the new method] is”; being skeptical about claims is a habit students learn, early on.

Guest, I looked back at various recent stats texts I own. The texts on classical stats present the methods rather dispassionately. The Bayesian texts do more proselytizing, using these howlers, as Mayo calls them. Granted, I own Kadane and Howson and Ziliak/McCloskey and these books have made an impression on me. I think the proselytizing is taking the place of sound reasoning and this is influencing non-stats professionals unduly.

This seems an appropriate comment thread in which to report the results of my introspection as suggested by Mayo:

“But here’s the key issue Corey, so maybe you and others will just try this out as a little exercise over the weekend: whenever you assume you must wish to say “hypothesis H attains a much higher posterior probability than its denial” say instead, H is much better tested, (or if you prefer, more highly corroborated) in the sense that the ways H can be wrong here have been well checked and found to be absent.”

Upon reflection: if my brain performed ampliative inference according to the normative principles to which I subscribe, then any hypothesis not distinguished by particularly high or low prior probability could have higher posterior probability than all mutually exclusive alternative hypotheses only if it had been “better tested” in the sense above.

I differ from Mayo about well-tested hypotheses in two ways: (i) I would quantify evidence in the data by likelihood ratios on simple statistical hypotheses while Mayo’s severity function is defined in terms of p-values, and (ii) to deal with conjunctions and negations of simple statistical hypotheses I use the sum and product rule of probability theory operating on posterior probability distributions; as far as I can tell, Mayo’s severity approach works by hitting each conjunct of a conjunction with a test that’s severe in the p-value sense.

Corey: I want to study your introspection experiment much more carefully later on. thanks.

Corey:

(1) First , I want to encourage you in your introspection, and as a next step: Now consider cases of little or no evidence for some claim H, maybe where you have evidence that nothing whatever has been done to rule out the falsity of H. H is poorly tested: do you give H a low probability?

(2) There’s a lot in what you wrote that is not clear. For example:

“Upon reflection: if my brain performed ampliative inference according to the normative principles to which I subscribe, then any hypothesis not distinguished by particularly high or low prior probability could have higher posterior probability than all mutually exclusive alternative hypotheses only if it had been “better tested” in the sense above.”

Very unclear, and even sounds as if you’d call something better tested only if it didn’t have a very low or high prior?

(3) But let me just correct one thing:

“I differ from Mayo about well-tested hypotheses in two ways: (i) I would quantify evidence in the data by likelihood ratios on simple statistical hypotheses while Mayo’s severity function is defined in terms of p-values,”

No, (SEV is not defined in terms of p-values). First of all severity is intended to be relevant for both statistical and non-statistical claims, but let’s just focus on the statistical. SEV uses p-values, but always considers it as a random variable so that one can evaluate it under the null and alternatives to the null. One sees this even in my joint papers with Cox, e.g., 2010, p. 263, and p. 290 (you can find them off the blog). As I assume you know, N-P tests can be made out in terms of p-values in this manner. These tests grow out of likelihood ratio tests, but here too, we must consider the LR as a statistic to obtain error probabilities.

I’ll come back to your (ii) another time.

Mayo,

I’m trying to understand this example to use it against Bayesians.

If there are 999 people who aren’t ready for college but he each have 5% chance of high scores then there will be approximately 50 in the population who appear ready for college (have high scores) but aren’t. While there is only 1 who appears ready and actually is.

So if we see someone randomly chosen from this population who has appears ready (has high scores) then there is only about a 1/50 chance they actually are ready. This agrees with the Bayesian calculation.

Are you saying an Error Statistician say this is wrong and we actually have strong evidence that individual observed is ready for college?

Fisher! So glad you’re back! No that is not how the numbers are given. We just have this 1 student Isaac. He happened to be selected from FewReady town or the like. Of course the statistics on the college test (false neg/false pos) come from anywhere, we are not told. As I said, I try to help the critic with the numbers in order to grant as much as possible. There’s no presumption that each student actually has the same chance of readiness, the numbers are just what they are.

As for my own take, as I note, I’d want a test with degrees of readiness—there’s an indication of some degree of readiness–and also I’d want to state a level of “achievement” (whatever that is) that was beyond what this data indicated. But I am going along with a highly artificial dichotomy. I would deny the scores are good evidence of unreadiness.

Corey: Regarding your (ii) above I’d like to draw your attention to an (at least) entertaining observation made by Laurie Davies in his “Data Features” paper (Statistica Neerlandica 49 (1995), 185–245), namely that if we think about statistical hypotheses as approximations and not the exact truth, standard probability axioms don’t make sense for hypotheses. This is because if the N(7,10)-distribution is a proper approximation, the N(7.0001,9.9999)-distribution is a proper approximation as well. This contradicts the sum rule for probabilities, which is based on the principle that if one event happens, its negation cannot happen (if “events” are “distributions”, both can still be approximations).

Hennig: This gets to an equivocation I always find in the meaning or supposed meaning of Bayesian degree of belief assignment to hypotheses. But wouldn’t the Bayesian give different probability assignments to these ranges?

Sorry, I don’t understand this question. What “ranges”?

Christian: explain why a sum rule contradiction occurs.

The sum rule says that P(A+B)=P(A)+P(B), the notation “A+B” assuming that A and B are disjunct. However, if A and B are disjunct sets of probability distributions and A is interpreted as happening if a member of A approximates the truth (assuming that it exists) in a certain sense, they may both contain different distributions that can be considered as proper approximations of the very same truth, such as N(7,10) and N(7.001,9.999), in which case the probability for “a distribution in A+B approximates the truth” may be lower than the sum of the two single set probabilities. It may even be that all distributions in A are approximately (but not exactly) equal to all distributions in B, in which case potentially P(A+B)=P(B) (if P and the sets are interpreted in the approximation-sense as above) even if P(A) is substantially larger than zero.

Note that this of course is not a mathematical contradiction but an implication of a certain interpretation of these probabilities, which many people make in practice, although it would does for example not occur in de Finetti’s approach, which is not about approximating true distributions.