Bayesian philosophers (among others) have analogous versions of the criticism in my April 28 blogpost: error probabilities (associated with inferences to hypotheses) may conflict with chosen posterior probabilities in hypotheses. Since it’s Saturday night let’s listen in to one of the comedy hours at the Bayesian retreat (note the sedate philosopher’s comedy club backdrop):
Did you hear the one about the frequentist error statistical tester who inferred a hypothesis H passed a stringent test (with data x)?
The problem was, the epistemic probability in H was so low that H couldn’t be believed! Instead we believe its denial H’! So, she will infer hypotheses that are simply unbelievable!
So clearly the error statistical testing account fails to serve in an account of knowledge or inference (i.e., an epistemic account). However severely I might wish to say that a hypothesis H has passed a test, the Bayesian critic assigns a sufficiently low prior probability to H so as to yield a low posterior probability in H[i]. But this is no argument about why this counts in favor of, rather than against, their Bayesian computation as an appropriate assessment of the warrant to be accorded to hypothesis H.
To begin with, in order to use techniques for assigning frequentist probabilities to events, their examples invariably involve “hypotheses” that consist of asserting that a sample possesses a characteristic, such as “having a disease” or “being college ready” or, for that matter, “being true.” This would not necessarily be problematic if it were not for the fact that their criticism requires shifting the probability to the particular sample selected—for example, a student Isaac is college-ready, or this null hypothesis (selected from a pool of nulls) is true. This was, recall, the fallacious probability assignment that we saw in Berger’s attempt, later (perhaps) disavowed. Also there are just two outcomes, say s and ~s, and no degrees of discrepancy from H.
Isaac and college readiness
An example that Peter Achinstein[ii] and I have debated concerns a student, Isaac, who has taken a battery of tests and achieved very high scores, s, something given to be highly improbable for those who are not college ready.[iii] We can write the hypothesis:
H(I): Isaac is college ready.
And let the denial be H’:
H’(I): Isaac is not college ready (i.e., he is deficient).
The probability for such good results, given a student is college ready, is extremely high:
P(s | H(I)) is practically 1,
while very low assuming he is not college ready. In one computation, the probability that Isaac would get such high test results, given that he is not college ready, is .05:
P(s | H’(I)) =.05.
But imagine, continues our critic, that Isaac was randomly selected from the population of students in, let us say, Fewready Town—where college readiness is extremely rare, say one out of one thousand. The critic infers that the prior probability of Isaac’s college-readiness is therefore .001:
(*) P(H(I)) = .001.
If so, then the posterior probability that Isaac is college ready, given his high test results, would be very low:
p(H(I)|s) is very low,
even though the posterior probability has increased from the prior in (*).
This is supposedly problematic for testers because we’d say this was evidence for H(I) (readiness). Actually I would want degrees of readiness to make my inference, but these are artificially excluded here.
But, even granting his numbers, the main fallacy here is fallacious probabilistic instantiation. Although the probability of a randomly selected student taken from high schoolers in Fewready Town is .001, it does not follow that Isaac, the one we happened to select, has a probability of .001 of being college ready (Mayo 1997, 2005, 117).
Achinstein (2010, 187) says he will grant the fallacy…but only for frequentists:
“My response to the probabilistic fallacy charge is to say that it would be true if the probabilities in question were construed as relative frequencies. However, … I am concerned with epistemic probability.”
He is prepared to grant the following instantiations:
- P% of the hypotheses in a given pool of hypotheses are true (or a character holds for p%).
- The particular hypothesis Hi was randomly selected from this pool.
- Therefore, the objective epistemic probability P(Hi is true) = p.
Of course, epistemic probabilists are free to endorse this road to posteriors—this just being a matter of analytic definition. But the consequences speak loudly against the desirability of doing so.
No Severity. The example considers only two outcomes: reaching the high scores s, or reaching lower scores, ~s. Clearly a lower grade gives even less evidence of readiness; that is, P(H’(I)| ~s) > P(H’(I)|s). Therefore, whether Isaac scored as high as s or lower, ~s, the epistemic probabilist is justified in having high belief that Isaac is not ready. Even if he claims he is merely blocking evidence for Isaac’s readiness (and not saying he believes highly in his unreadiness), the analysis is open to problems: the probability of finding evidence of Isaac’s readiness even if in fact he is ready (H(I) is true) is low if not zero. Other Bayesians might interpret things differently, noting that since the posterior for readiness has increased, the test scores provide at least some evidence for H(I)—but then the invocation of the example to demonstrate a conflict between a frequentist and Bayesian assessment would seem to diminish or evaporate.
Reverse Discrimination? To push the problem further, suppose that the epistemic probabilist receives a report that Isaac was in fact selected randomly, not from Fewready Town, but from a population where college readiness is common, Fewdeficient Town. The same scores s now warrant the assignment of a strong objective epistemic belief in Isaac’s readiness (i.e., H(I)). A high-school student from Fewready Town would need to have scored quite a bit higher on these same tests than a student selected from Fewdeficient Town for his scores to be considered evidence of his readiness. (Reverse discrimination?) When we move from hypotheses like “Isaac is college ready” to scientific generalizations, the difficulties become even more serious.
We need not preclude that H(I) has a legitimate frequentist prior; the frequentist probability that Isaac is college ready might refer to generic and environmental factors that determine the chance of his deficiency—although I do not have a clue how one might compute it. The main thing is that this probability is not given by the probabilistic instantiation above.
These examples, repeatedly used in criticisms, invariably shift the meaning from one kind of experimental outcome—a randomly selected student has the property “college ready”—to another—a genetic and environmental “experiment” concerning Isaac in which the outcomes are ready or not ready.
This also points out the flaw in trying to glean reasons for epistemic belief with just any conception of “low frequency of error.” If we declared each student from Fewready to be “unready,” we would rarely be wrong, but in each case the “test” has failed to discriminate the particular student’s readiness from his unreadiness. Moreover, were we really interested in the probability of the event that a student randomly selected from a town is college ready, and had the requisite probability model (e.g., Bernouilli), then there would be nothing to stop the frequentist error statistician from inferring the conditional probability. However, there seems to be nothing “Bayesian” in this relative frequency calculation. Bayesians scarcely have a monopoly on the use of conditional probability! But even here it strikes me as a very odd way to talk about evidence.
Achinstein, P. (2001), The Book of Evidence, Oxford: Oxford University Press.
— (2010), “Mill’s Sins or Mayo’s Errors?”, pp. 170-188 in D. G. Mayo and A. Spanos (eds.), Error and Inference. Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, Chicago: Chicago University Press.
— (2011), “Achinstein Replies” pp. 258-98 in G. Morgan (ed.) Philosophy of Science Matters: The Philosophy of Peter Achinstein. Oxford: Oxford University Press.
Howson, C. (1997a), “A Logic of Induction”, Philosophy of Science 64, 268–90.
— (1997b), “Error Probabilities in Error,” Philosophy of Science 64(4),194.
Mayo, D. G (1997a), “Response to Howson and Laudan,” Philosophy of Science 64: 323-333.
Mayo, D. G. (1997b), “Error Statistics and Learning from Error: Making a Virtue of Necessity,” in L. Darden (ed.) Supplemental Issue PSA 1996: Symposia Papers, Philosophy of Science 64, S195-S212.
— (2005), Evidence as Passing Severe Tests: Highly Probed vs. Highly Proved, pp. 95-127 in P. Achinstein (ed.) Scientific Evidence. Johns Hopkins University Press.
[i] e.g., Howson 1997a, b; Achinstein 2001, 2010, 2011.
[ii] Peter Achinstein, Professor of Philosophy at Johns Hopkins University. Among his many publications, he is the author of: The Concept of Evidence (1983); Particles and Waves: Historical Essays in the Philosophy of Science (1991) for which he received the prestigious Lakatos Prize in 1993; and The Book of Evidence (2003).
[iii] I think Peter and I have finally put this particular example to rest at a workshop I held here in April 2011, with grad students from my philosophy of science seminar. When a student inquired as to where we now stood on the example, toward the end of the workshop, my response was to declare, with relief, that Isaac had graduated from college (NYU)! Peter’s response dealt with the movie “Stand and Deliver!” (where I guess reverse discrimination was warranted for a time.)
Mayo, as Lehmann has put it : “If one firmly believes the hypothesis to be true, extremely convincing evidence will be required before one is willing to give up this belief, and the significance level will accordingly be set very low”.
So what if “P(s | H’(I)) =.05.” is decided as not (severe) enough, given that the student is from “Fewready”?
If someone had decided that (and it seems to me that this is permited in the error statistical approach, that is, to judge what the stringency level has to be), than the problem would still be there. It wouldn’t be a problem of the Bayesian approach per se.
It seems to me that the question here is not epistemic, but rather moral. It would not be fair to set up a higher bar for a student, just because of his background.
Sorry, not sure I see what the problem would be? I might note that I’m opposed to setting a single cut-off for evidence, but would rather report how well probed a given hypothesis is, and how well/poorly discrepancies from it. So, this kind of example is rather ill-formed, but since that’s what they’re saying at the comedy club, I’m prepared to consider them. And of course,the whole idea of college readiness is so imprecise, as opposed to, say, degree to which high school material has been mastered, perhaps with a NY state regent’s exam. I am assuming one tries to flesh these out for purposes of entertaining the criticism. (The same type of example is sometimes made with hypotheses about the presence or absence of a disease.)
On the question of moral values here, that would allude to something distinct from merely evaluating what the evidence indicates about his readiness. However, one could say that adopting one or another inference account here, does have moral implications, if this construal of scores is to be the basis of a policy. And, indeed, it is the basis of policy! (When I was first confronted with this criticism, as it happened, I consulted Erich Lehmann whose wife was working at the Educational Testing Service in N.J., and he was telling me of the problem….More on this perhaps another time.) I happen to think it is wrong, logically, epistemologically, and morally, to assign Isaac a probability of readiness based on the urn from which he’s been randomly drawn. Frankly, I thought this example would make it SO obvious how the meaning of the hypothesis under consideration was shifting through the example, that a point I’d been trying to make for awhile would be seen as obvious. To my surprise, the Bayesian epistemologists, at least to my knowledge, just bit the bullet and embraced ir as perfectly rational, given that is all the knowledge we have. Sorry to go on..
Carlos: I meant to note that however small the probability is made one can still find a smaller prior for H(I) for their result to occur.
It’s true that .05 is rather inadequate, but the critics numbers wouldn’t working out as well otherwise, so I hiked it up—
How can the probability of Isaac’s real readiness be a function of the % “ready” in the urn of names he was drawn out from? Depending on the pool, his “readiness” would change every day!