Oh No! It’s those mutant bears again. To my dismay, I’ve been sent, for the third time, that silly, snarky, adolescent, clip of those naughty “what the p-value” bears (first posted on Aug 5, 2012), who cannot seem to get a proper understanding of significance tests into their little bear brains. So apparently some people haven’t seen my rejoinder which, as I said then, practically wrote itself. So since it’s Saturday night here at the Elbar Room, let’s listen in to a mashup of both the clip and my original rejoinder (in which p-value bears are replaced with hypothetical Bayesian bears).
These stilted bear figures and their voices are sufficiently obnoxious in their own right, even without the tedious lampooning of p-values and the feigned horror at learning they should not be reported as posterior probabilities.
Bear #1: Do you have the results of the study?
Bear #2:Yes. The good news is there is a .996 probability of a positive difference in the main comparison.
Bear #1: Great. So I can be well assured that there is just a .004 probability that such positive results would occur if they were merely due to chance.
Bear #2: Not really, that would be an incorrect interpretation.
Bear #1: Oh. I see. Then you must mean 99.6% of the time a smaller difference would have been observed if in fact the null hypothesis of “no effect” was true.
Bear #2: No, that would also be an incorrect interpretation.
Bear #1: Well then you must be saying it is rational to believe to degree .996 that there is a real difference?
Bear #2: It depends. That might be so if the prior probability distribution was a proper probabilistic distribution representing rational beliefs in the different possible parameter values independent of the data.
Bear #1: But I was assured that this would be a nonsubjective Bayesian analysis.
Bear #2: Yes, the prior would at most have had the more important parameters elicited from experts in the field, the remainder being a product of one of the default or conjugate priors.
Bear #1: Well which one was used in this study?
Bear #2: I would need to find out, I came into the project at the point of trying to find an adequate statistical model; this alone required six different adjustments of the model.
Bear #1: So can you explain to me what a posterior of 0.996 really means?
Bear #2: There is no unanimity as to the definition of objective Bayesian analysis, nor even unanimity as to its goal. It is a quantitative construct arrived at by means of a Bayesian computation based on a prior distribution.
Bear #1: But I am assured the priors are coherent, and do not violate probability axioms, correct?
Bear #2: Not in general. Conventional priors may not even be probabilities in that a constant or flat prior for a parameter may not sum to 1 (improper prior).
Bear #1: If priors are not probabilities, how do I know the posterior is a probability?
Bear #2: The posterior distribution can generally be justified as limiting approximations to proper prior posteriors.
Bear #1: Yeah right. Well the important thing is that this is stronger evidence of a genuine effect than was reported in the recent Hands-Price study: they had only a .965 posterior probability.
Bear #2: Not necessarily. I would have to know their sample size, type of prior used, whether they were doing a Bayesian highest probability density interval or treating it as a test, possibly with a “spiked” prior.
Bear #1: You are not serious.
Bear #2: Unfortunately I’m very serious. Bayesian analyses are like that.
Bear #1: Aren’t all the objective, default priors agreed upon conventions?
Bear #2: Not at all. For instance, one school defines the prior via the (asymptotic) model-averaged information difference between the prior and the posterior; by contrast, the matching prior approach seeks priors that yield optimal frequentist confidence sets for the given model, and there are also model-dependent invariance approaches. Even within a given approach the prior for a particular parameter may depend on whether it is a parameter “of interest” or if it is a nuisance parameter, and even on the “order of importance” in which nuisance parameters are arranged.
Bear #1: Wait a tick: we have a higher posterior probability than the Hands-Price study and you’re saying we might not have stronger evidence?
Bear #2: Yes. Even if you’re both doing the same kind of default Bayesian analysis, the two studies may have started with different choices of priors.
Bear #1: But even if the two studies had started with different priors, that difference would have been swamped out by the data, right?
Bear #2: Not necessarily. It will depend on how extreme the priors are relative to the amount of data collected, among many other things.
Bear #1: What good is that? Please assure me at least that if I report a high posterior probability in the results being genuine there is no way it is the result of such shenanigans as hunting and searching until obtaining such an impressive effect.
Bear #2: I’m afraid I can’t, the effect of optional stopping is not generally regarded as influencing the Bayesian computation; this is called the Stopping rule Principle.
Bear #1: You’re not serious.
Bear #2: I am very serious. Granted, stopping rules can be taken account of in a prior, but then the result is Bayesian incoherent (in violating the likelihood principle), but there is no unanimity on this among Bayesian statisticians at the moment. It is a matter of theoretical research
Bear #1: Just to try this one last time: can you tell me how to interpret the reported posterior of .996.
Bear #2: The default posteriors are numerical constructs arrived at by means of conventional computations based on a prior which may in some sense be regarded as either primitive or as selected by a combination of pragmatic considerations and background knowledge, together with mathematical likelihoods given by a stipulated statistical model. The interpretation of the posterior probability will depend on the interpretation of the prior that went into the computation, and the priors are to be construed as conventions for obtaining the default posteriors.
Bear #1: I have had enough please go away now
The presumption of their clip is that somehow there would be no questions or confusion of interpretation were the output in the form of a posterior probability. The problem of indicating the extent of discrepancies that are/are not warranted by a given p-value is genuine but easy enough to solve*. What I never understand is why it is presupposed that the most natural and unequivocal way to interpret and communicate evidence (in this case, leading to low p-values) is by means of a (posterior) probability assignment, when it seems clear that the more relevant question that the testy-voiced (“just wait a tick”) bear would put to the know-it-all bear would be: how often would this method have erroneously declared a genuine discrepancy?
I once checked out the”xtranormal” http://www.xtranormal.com/ website. Turns out there are other figures aside from the bears that one may hire out, but they pronounce “Bayesian” as an unrecognizable, foreign-sounding word with around five syllables. My idea for the rejoinder was never polished up for actually making a clip. Doing so would appeal to my artist side, which means I would get too involved. Anyone who wants to take the plunge, let me know. In fact the original post had 16 comments where several reader improvements were suggested. Please send corrections and additions.
Reference: Blume, J. and J. F. Peipert (2003). “What your statistician never told you about P-values.” J Am Assoc Gynecol Laparosc 10(4): 439-444.
*See for example, Mayo & Spanos (2011) ERROR STATISTICS