Posts Tagged With: Comedy club

Comedy hour at the Bayesian (epistemology) retreat: highly probable vs highly probed (vs B-boosts)

Since we’ll be discussing Bayesian confirmation measures in next week’s seminar—the relevant blogpost being here--let’s listen in to one of the comedy hours at the Bayesian retreat as reblogged from May 5, 2012.

Did you hear the one about the frequentist error statistical tester who inferred a hypothesis H passed a stringent test (with data x)?

The problem was, the epistemic probability in H was so low that H couldn’t be believed!  Instead we believe its denial H’!  So, she will infer hypotheses that are simply unbelievable!

So it appears the error statistical testing account fails to serve as an account of knowledge or evidence (i.e., an epistemic account). However severely I might wish to say that a hypothesis H has passed a test, this Bayesian critic assigns a sufficiently low prior probability to H so as to yield a low posterior probability in H[i].  But this is no argument about why this counts in favor of, rather than against, their particular Bayesian computation as an appropriate assessment of the warrant to be accorded to hypothesis H.

To begin with, in order to use techniques for assigning frequentist probabilities to events, their examples invariably involve “hypotheses” that consist of asserting that a sample possesses a characteristic, such as “having a disease” or “being college ready” or, for that matter, “being true.”  This would not necessarily be problematic if it were not for the fact that their criticism requires shifting the probability to the particular sample selected—for example, a student Isaac is college-ready, or this null hypothesis (selected from a pool of nulls) is true.  This was, recall, the fallacious probability assignment that we saw in Berger’s attempt, later (perhaps) disavowed. Also there are just two outcomes, say s and ~s, and no degrees of discrepancy from H. Continue reading

Categories: Comedy, confirmation theory | Tags: , , , , | 28 Comments

Comedy hour at the Bayesian (epistemology) retreat: highly probable vs highly probed (vs what ?)

Our favorite high school student, Isaac, gets a better shot at showing his college readiness using one of the comparative measures of support or confirmation discussed last week. Their assessment thus seems more in sync with the severe tester, but they are not purporting that z is evidence for inferring (or even believing) an H to which z affords a high B-boost*. Their measures identify a third category that reflects the degree to which H would predict z (where the comparison might be predicting without z, or under ~H or the like).  At least if we give it an empirical, rather than a purely logical, reading. Since it’s Saturday night let’s listen in to one of the comedy hours at the Bayesian retreat as reblogged from May 5, 2012.

Did you hear the one about the frequentist error statistical tester who inferred a hypothesis H passed a stringent test (with data x)?

The problem was, the epistemic probability in H was so low that H couldn’t be believed!  Instead we believe its denial H’!  So, she will infer hypotheses that are simply unbelievable!

So it appears the error statistical testing account fails to serve as an account of knowledge or evidence (i.e., an epistemic account). However severely I might wish to say that a hypothesis H has passed a test, this Bayesian critic assigns a sufficiently low prior probability to H so as to yield a low posterior probability in H[i].  But this is no argument about why this counts in favor of, rather than against, their particular Bayesian computation as an appropriate assessment of the warrant to be accorded to hypothesis H.

To begin with, in order to use techniques for assigning frequentist probabilities to events, their examples invariably involve “hypotheses” that consist of asserting that a sample possesses a characteristic, such as “having a disease” or “being college ready” or, for that matter, “being true.”  This would not necessarily be problematic if it were not for the fact that their criticism requires shifting the probability to the particular sample selected—for example, a student Isaac is college-ready, or this null hypothesis (selected from a pool of nulls) is true.  This was, recall, the fallacious probability assignment that we saw in Berger’s attempt, later (perhaps) disavowed. Also there are just two outcomes, say s and ~s, and no degrees of discrepancy from H. Continue reading

Categories: Comedy, confirmation theory, Statistics | Tags: , , , , | 20 Comments

Comedy Hour at the Bayesian (Epistemology) Retreat: Highly Probable vs Highly Probed

Bayesian philosophers (among others) have analogous versions of the criticism in my April 28 blogpost: error probabilities (associated with inferences to hypotheses) may conflict with chosen posterior probabilities in hypotheses. Since it’s Saturday night let’s listen in to one of the comedy hours at the Bayesian retreat (note the sedate philosopher’s comedy club backdrop):

Did you hear the one about the frequentist error statistical tester who inferred a hypothesis H passed a stringent test (with data x)?

The problem was, the epistemic probability in H was so low that H couldn’t be believed!  Instead we believe its denial H’!  So, she will infer hypotheses that are simply unbelievable!

So clearly the error statistical testing account fails to serve in an account of knowledge or inference (i.e., an epistemic account). However severely I might wish to say that a hypothesis H has passed a test, the Bayesian critic assigns a sufficiently low prior probability to H so as to yield a low posterior probability in H[i].  But this is no argument about why this counts in favor of, rather than against, their Bayesian computation as an appropriate assessment of the warrant to be accorded to hypothesis H.

To begin with, in order to use techniques for assigning frequentist probabilities to events, their examples invariably involve “hypotheses” that consist of asserting that a sample possesses a characteristic, such as “having a disease” or “being college ready” or, for that matter, “being true.”  This would not necessarily be problematic if it were not for the fact that their criticism requires shifting the probability to the particular sample selected—for example, a student Isaac is college-ready, or this null hypothesis (selected from a pool of nulls) is true.  This was, recall, the fallacious probability assignment that we saw in Berger’s attempt, later (perhaps) disavowed. Also there are just two outcomes, say s and ~s, and no degrees of discrepancy from H. Continue reading

Categories: philosophy of science, Statistics | Tags: , , , , | 34 Comments

Comedy Hour at the Bayesian Retreat: P-values versus Posteriors

Did you hear the one about the frequentist significance tester when he was shown the nonfrequentist nature of p-values?

JB: I just simulated a long series of tests on a pool of null hypotheses, and I found that among tests with p-values of .05, at least 22%—and typically over 50%—of the null hypotheses are true!

Frequentist Significance Tester: Scratches head: But rejecting the null with a p-value of .05 ensures erroneous rejection no more than 5% of the time!

Raucous laughter ensues!

(Hah, hah,…. I feel I’m back in high school: “So funny, I forgot to laugh!)

The frequentist tester should retort:

Frequentist significance tester: But you assumed 50% of the null hypotheses are true, and  computed P(H0|x) (imagining P(H0)= .5)—and then assumed my p-value should agree with the number you get!

But, our significance tester is not heard from as they move on to the next joke….

Of course it is well-known that for a fixed p-value, with a sufficiently large n, even a statistically significant result can correspond to large posteriors in H0 [i] .  Somewhat more recent work generalizes the result, e.g., J. Berger and Sellke, 1987. Although from their Bayesian perspective, it appears that p-values come up short as measures of evidence, the significance testers balk at the fact that use of the recommended priors allows highly significant results to be interpreted as no evidence against the null — or even evidence for it!   An interesting twist in recent work is to try to “reconcile” the p-value and the posterior e.g., Berger 2003[ii].

The conflict between p-values and Bayesian posteriors considers the two sided  test of the Normal mean, H0: μ = μ0 versus H1: μ ≠ μ0 .

“If n = 50 one can classically ‘reject H0 at significance level p = .05,’ although Pr (H0|x) = .52 (which would actually indicate that the evidence favors H0).” (Berger and Sellke, 1987, p. 113).

If n = 1000, a result statistically significant at the .05 level leads to a posterior to the null of .82!

CHART

Table 1 (modified) from J.O. Berger and T. Selke (1987) “Testing a Point Null Hypothesis,” JASA 82(397) : 113.

Many find the example compelling evidence that the p-value “overstates evidence against a null” because it claims to use an “impartial” or “uninformative”(?) Bayesian prior probability assignment of .5 toH0, the remaining .5 being spread out over the alternative parameter space. Others charge that the problem is not p-values but the high prior (Casella and R.Berger, 1987).  Moreover, the “spiked concentration of belief in the null” is at odds with the prevailing view “we know all nulls are false”.  Note too the conflict with confidence interval reasoning since the value zero (0) lies outside the corresponding confidence interval (Mayo 2005).

But often, as in the opening joke, the prior assignment is claimed to be keeping to the frequentist camp and frequentist error probabilities: it is imagined that we sample randomly from a population of hypotheses, some proportion of which are assumed to be true, 50% is a common number used. We randomly draw a hypothesis and get this particular one, maybe it concerns the mean deflection of light, or perhaps it is an assertion of bioequivalence of two drugs or whatever. The percentage “initially true” (in this urn of nulls) serves as the prior probability for H0. I see this gambit in statistics, psychology, philosophy and elsewhere, and yet it commits a fallacious instantiation of probabilities:

50% of the null hypotheses in a given pool of nulls are true.

This particular null H0 was randomly selected from this urn (and, it may be added, nothing else is known, or the like).

Therefore P(H0 is true) = .5.

It isn’t that one cannot play a carnival game of reaching into an urn of nulls (and one can imagine lots of choices for what to put in the urn), and use a Bernouilli model for the chance of drawing a true hypothesis (assuming we could even tell), but this “generic hypothesis”  is no longer the particular hypothesis one aims to use in computing the probability of data x0 (be it on eclipse data, risk rates, or whatever) under hypothesis H0. [iii]  In any event .5 is not the frequentist probability that the chosen null H0 is true. (Note the selected null would get the benefit of being selected from an urn of nulls where few have been shown false yet: “innocence by association”.)

Yet J. Berger claims his applets are perfectly frequentist, and by adopting his recommended O-priors, we frequentists can become more frequentist (than using our flawed p-values)[iv]. We get what he calls conditional p-values (of a special sort). This is a reason for a coining a different name, e.g.,  frequentist error statistician.

Upshot: Berger and Sellke tell us they will cure  the significance tester’s tendency to exaggerate the evidence against the null  (in two-sided testing) by using some variant on a spiked prior. But the result of their “cure” is that outcomes may too readily be taken as no evidence against, or even evidence for, the null hypothesis, even if it is false.  We actually don’t think we need a cure.  Faced with conflicts between error probabilities and Bayesian posterior probabilities, the error statistician may well conclude that the flaw lies with the latter measure. This is precisely what Fisher argued:

Discussing a test of the hypothesis that the stars are distributed at random, Fisher takes the low p-value (about 1 in 33,000) to “exclude at a high level of significance any theory involving a random distribution” (Fisher, 1956, page 42). Even if one were to imagine that H0 had an extremely high prior probability, Fisher continues—never minding “what such a statement of probability a priori could possibly mean”—the resulting high posteriori probability to H0, he thinks, would only show that “reluctance to accept a hypothesis strongly contradicted by a test of significance” (44) . . . “is not capable of finding expression in any calculation of probability a posteriori” (43). Sampling theorists do not deny there is ever a legitimate frequentist prior probability distribution for a statistical hypothesis: one may consider hypotheses about such distributions and subject them to probative tests. Indeed, Fisher says,  if one were to consider the claim about the a priori probability to be itself a hypothesis, it would be rejected by the data!


[i] A result my late colleague I.J. wanted me to call the Jeffreys-Good-Lindley Paradox).

[ii] An applet is available at http://www.stat.duke.edu/∼berger

[iii] Bayesian philosophers, e.g., Achinstein, allow this does not yield a frequentist prior, but he claims it yields an acceptable prior for the epistemic  probabilist (e.g., See Error and Inference 2010).

[iv]Does this remind you of how the Bayesian is said to become more subjective by using the Berger O-Bayesian prior? See Berger deconstruction.

References & Related articles

Berger, J. O.  (2003). “Could Fisher, Jeffreys and Neyman have Agreed on Testing?” Statistical Science 18: 1-12.

Berger, J. O. and Sellke, T.  (1987). “Testing a point null hypothesis: The irreconcilability of p values and evidence,” (with discussion). J. Amer. Statist. Assoc. 82: 112–139.

Cassella G. and Berger, R..  (1987). “Reconciling Bayesian and Frequentist Evidence in the One-sided Testing Problem,” (with discussion). J. Amer. Statist. Assoc. 82 106–111, 123–139.

Fisher, R. A., (1956) Statistical Methods and Scientific Inference, Edinburgh: Oliver and Boyd.

Jeffreys, (1939) Theory of Probability, Oxford: Oxford University Press.

Mayo, D. (2003), Comment on J. O. Berger’s “Could Fisher,Jeffreys and Neyman Have Agreed on Testing?”Statistical Science 18, 19-24.

Mayo, D. (2004). “An Error-Statistical Philosophy of Evidence,” in M. Taper and S. Lele (eds.) The Nature of Scientific Evidence: Statistical, Philosophical and Empirical Considerations. Chicago: University of Chicago Press: 79-118.

Mayo, D.G. and Cox, D. R. (2006) “Frequentists Statistics as a Theory of Inductive Inference,” Optimality: The Second Erich L. Lehmann Symposium (ed. J. Rojo), Lecture Notes-Monograph series, Institute of Mathematical Statistics (IMS), Vol. 49: 77-97.

Mayo, D. and Kruse, M. (2001). “Principles of Inference and Their Consequences,” in D. Cornfield and J. Williamson (eds.) Foundations of Bayesianism. Dordrecht: Kluwer Academic Publishes: 381-403.

Mayo, D. and Spanos, A. (2011) “Error Statistics” in Philosophy of Statistics , Handbook of Philosophy of Science Volume 7 Philosophy of Statistics, (General editors: Dov M. Gabbay, Paul Thagard and John Woods; Volume eds. Prasanta S. Bandyopadhyay and Malcolm R. Forster.) Elsevier: 1-46.
Categories: Statistics | Tags: , , , , , | 53 Comments

Blog at WordPress.com.