Since we’ll be discussing Bayesian confirmation measures in next week’s seminar—the relevant blogpost being here--let’s listen in to one of the comedy hours at the Bayesian retreat as reblogged from May 5, 2012.
Did you hear the one about the frequentist error statistical tester who inferred a hypothesis H passed a stringent test (with data x)?
The problem was, the epistemic probability in H was so low that H couldn’t be believed! Instead we believe its denial H’! So, she will infer hypotheses that are simply unbelievable!
So it appears the error statistical testing account fails to serve as an account of knowledge or evidence (i.e., an epistemic account). However severely I might wish to say that a hypothesis H has passed a test, this Bayesian critic assigns a sufficiently low prior probability to H so as to yield a low posterior probability in H[i]. But this is no argument about why this counts in favor of, rather than against, their particular Bayesian computation as an appropriate assessment of the warrant to be accorded to hypothesis H.
To begin with, in order to use techniques for assigning frequentist probabilities to events, their examples invariably involve “hypotheses” that consist of asserting that a sample possesses a characteristic, such as “having a disease” or “being college ready” or, for that matter, “being true.” This would not necessarily be problematic if it were not for the fact that their criticism requires shifting the probability to the particular sample selected—for example, a student Isaac is college-ready, or this null hypothesis (selected from a pool of nulls) is true. This was, recall, the fallacious probability assignment that we saw in Berger’s attempt, later (perhaps) disavowed. Also there are just two outcomes, say s and ~s, and no degrees of discrepancy from H.
Isaac and college readiness
An example that Peter Achinstein[ii] and I have debated concerns a student, Isaac, who has taken a battery of tests and achieved very high scores, s, something given to be highly improbable for those who are not college ready.[iii] We can write the hypothesis:
H(I): Isaac is college ready.
And let the denial be H’:
H’(I): Isaac is not college ready (i.e., he is deficient).
The probability for such good results, given a student is college ready, is extremely high:
P(s | H(I)) is practically 1,
while very low assuming he is not college ready. In one computation, the probability that Isaac would get such high test results, given that he is not college ready, is .05:
P(s | H’(I)) =.05.
But imagine, continues our critic, that Isaac was randomly selected from the population of students in, let us say, Fewready Town—where college readiness is extremely rare, say one out of one thousand. The critic infers that the prior probability of Isaac’s college-readiness is therefore .001:
(*) P(H(I)) = .001.
If so, then the posterior probability that Isaac is college ready, given his high test results, would be very low:
p(H(I)|s) is very low,
even though the posterior probability has increased from the prior in (*).
This is supposedly problematic for testers because we’d say this was evidence for H(I) (readiness). Actually I would want degrees of readiness to make my inference, but these are artificially excluded here.
But, even granting his numbers, the main fallacy here is fallacious probabilistic instantiation. Although the probability of a randomly selected student taken from high schoolers in Fewready Town is .001, it does not follow that Isaac, the one we happened to select, has a probability of .001 of being college ready (Mayo 1997, 2005, 117).
Achinstein (2010, 187) says he will grant the fallacy…but only for frequentists:
“My response to the probabilistic fallacy charge is to say that it would be true if the probabilities in question were construed as relative frequencies. However, … I am concerned with epistemic probability.”
He is prepared to grant the following instantiations:
- P% of the hypotheses in a given pool of hypotheses are true (or a character holds for p%).
- The particular hypothesis Hi was randomly selected from this pool.
- Therefore, the objective epistemic probability P(Hi is true) = p.
Of course, epistemic probabilists are free to endorse this road to posteriors—this just being a matter of analytic definition. But the consequences speak loudly against the desirability of doing so.
No Severity. The example considers only two outcomes: reaching the high scores s, or reaching lower scores, ~s. Clearly a lower grade gives even less evidence of readiness; that is, P(H’(I)| ~s) > P(H’(I)|s). Therefore, whether Isaac scored as high as s or lower, ~s, the epistemic probabilist is justified in having high belief that Isaac is not ready. Even if he claims he is merely blocking evidence for Isaac’s readiness (and not saying he believes highly in his unreadiness), the analysis is open to problems: the probability of finding evidence of Isaac’s readiness even if in fact he is ready (H(I) is true) is low if not zero. Other Bayesians might interpret things differently, noting that since the posterior for readiness has increased, the test scores provide at least some evidence for H(I)—but then the invocation of the example to demonstrate a conflict between a frequentist and Bayesian assessment would seem to diminish or evaporate.
Reverse Discrimination? To push the problem further, suppose that the epistemic probabilist receives a report that Isaac was in fact selected randomly, not from Fewready Town, but from a population where college readiness is common, Fewdeficient Town. The same scores s now warrant the assignment of a strong objective epistemic belief in Isaac’s readiness (i.e., H(I)). A high-school student from Fewready Town would need to have scored quite a bit higher on these same tests than a student selected from Fewdeficient Town for his scores to be considered evidence of his readiness. (Reverse discrimination?) When we move from hypotheses like “Isaac is college ready” to scientific generalizations, the difficulties become even more serious.
We need not preclude that H(I) has a legitimate frequentist prior; the frequentist probability that Isaac is college ready might refer to generic and environmental factors that determine the chance of his deficiency—although I do not have a clue how one might compute it. The main thing is that this probability is not given by the probabilistic instantiation above.
These examples, repeatedly used in criticisms, invariably shift the meaning from one kind of experimental outcome—a randomly selected student has the property “college ready”—to another—a genetic and environmental “experiment” concerning Isaac in which the outcomes are ready or not ready.
This also points out the flaw in trying to glean reasons for epistemic belief with just any conception of “low frequency of error.” If we declared each student from Fewready to be “unready,” we would rarely be wrong, but in each case the “test” has failed to discriminate the particular student’s readiness from his unreadiness. Moreover, were we really interested in the probability of the event that a student randomly selected from a town is college ready, and had the requisite probability model (e.g., Bernouilli), then there would be nothing to stop the frequentist error statistician from inferring the conditional probability. However, there seems to be nothing “Bayesian” in this relative frequency calculation. Bayesians scarcely have a monopoly on the use of conditional probability! But even here it strikes me as a very odd way to talk about evidence.
Bayesian statisticians have analogous versions of this criticism, discussed in my April 28 blogpost: error probabilities (associated with inferences to hypotheses) may conflict with chosen posterior probabilities in hypotheses.
*z “B-boosts” H iff: P(H|z) > P(H). Recommended C-measures vary. I don’t know what counts as a “high” B-boost, and that is a central problem with these measures.
For a formal statistical analogue, see this post.
Achinstein, P. (2001), The Book of Evidence, Oxford: Oxford University Press.
— (2010), “Mill’s Sins or Mayo’s Errors?”, pp. 170-188 in D. G. Mayo and A. Spanos (eds.), Error and Inference. Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, Chicago: Chicago University Press.
— (2011), “Achinstein Replies” pp. 258-98 in G. Morgan (ed.) Philosophy of Science Matters: The Philosophy of Peter Achinstein. Oxford: Oxford University Press.
Howson, C. (1997a), “A Logic of Induction”, Philosophy of Science 64, 268–90.
— (1997b), “Error Probabilities in Error,” Philosophy of Science 64(4),194.
Mayo, D. G (1997a), “Response to Howson and Laudan,” Philosophy of Science 64: 323-333.
Mayo, D. G. (1997b), “Error Statistics and Learning from Error: Making a Virtue of Necessity,” in L. Darden (ed.) Supplemental Issue PSA 1996: Symposia Papers, Philosophy of Science 64, S195-S212.
— (2005), Evidence as Passing Severe Tests: Highly Probed vs. Highly Proved, pp. 95-127 in P. Achinstein (ed.) Scientific Evidence. Johns Hopkins University Press.
[i] e.g., Howson 1997a, b; Achinstein 2001, 2010, 2011.
[ii] Peter Achinstein is Professor of Philosophy at Johns Hopkins University. Among his many publications, he is the author of: The Concept of Evidence (1983); Particles and Waves: Historical Essays in the Philosophy of Science (1991) for which he received the prestigious Lakatos Prize in 1993; and The Book of Evidence (2003).
[iii] I think Peter and I have finally put this particular example to rest at a workshop I held here in April 2011, with grad students from my philosophy of science seminar. When a student inquired as to where we now stood on the example, toward the end of the workshop, my response was to declare, with relief, that Isaac had graduated from college (NYU)! Peter’s response dealt with the movie “Stand and Deliver!” (where I guess reverse discrimination was warranted for a time.)
Added Oct 26, 2013: Moreover, Peter and I concur that evidence is a “threshold” concept.
Right off the bat, something important has been omitted. How did Isaac come to take the test? More than likely, he took it because he believed he was college-ready and also was interested in getting accepted by some college. So Isaac – very likely – wasn’t selected randomly from the set of the town’s high school students. The stereotyped Bayesian response here ought to have included this likelihood in its prior.
If this turns out not to apply, then we would have the lottery paradox. The probability of hitting one of the Megabucks-style lottery is so small that no one can expect to win it. Therefore no one should be a winner. Yet most of the time, *someone* does.
Perhaps we should call this the stereotype paradox instead.
And in a fit of unexpected coincidence, it happens that this month’s issue of Scientific American has an article on apparent paradoxes arising from trying to apply small probabilities when large number of cases are involved.