To my dismay, I’ve been sent, once again, that silly, snarky, adolescent, clip of those naughty “what the p-value” bears (see Aug 5 post),, who cannot seem to get a proper understanding of significance tests into their little bear brains. So apparently some people haven’t seen my rejoinder which, as I said then, practically wrote itself. So since it’s Saturday night here at the Elbar Room, let’s listen in to a reblog of my rejoinder (replacing p-value bears with hypothetical Bayesian bears)–but you can’t get it without first watching the Aug 5 post, since I’m mimicking them. [My idea for the rejoinder was never polished up for actually making a clip. In fact the original post had 16 comments where several reader improvements were suggested. Maybe someone will want to follow through*.] I just noticed a funny cartoon on Bayesian intervals on Normal Deviate’s post from Nov. 9.
This continues yesterday’s post: I checked out the the” xtranormal” http://www.xtranormal.com/ website. Turns out there are other figures aside from the bears that one may hire out, but they pronounce “Bayesian” as an unrecognizable, foreign-sounding word with around five syllables. Anyway, before taking the plunge, here is my first attempt, just off the top of my head. Please send corrections and additions.
Bear #1: Do you have the results of the study?
Bear #2:Yes. The good news is there is a .996 probability of a positive difference in the main comparison.
Bear #1: Great. So I can be well assured that there is just a .004 probability that such positive results would occur if they were merely due to chance.
Bear #2: Not really, that would be an incorrect interpretation.
Bear #1: Oh. I see. Then you must mean 99.6% of the time a smaller difference would have been observed if in fact the null hypothesis of “no effect” was true.
Bear #2: No, that would also be an incorrect interpretation.
Bear #1: Well then you must be saying it is rational to believe to degree .996 that there is a real difference?
Bear #2: It depends. That might be so if the prior probability distribution was a proper probabilistic distribution representing rational beliefs in the different possible parameter values independent of the data.
Bear #1: But I was assured that this would be a nonsubjective Bayesian analysis.
Bear #2: Yes, the prior would at most have had the more important parameters elicited from experts in the field, the remainder being a product of one of the default or conjugate priors.
Bear #1: Well which one was used in this study?
Bear #2: I would need to find out, I came into the project at the point of trying to find an adequate statistical model; this alone required six different adjustments of the model.
Bear #1: So can you explain to me what a posterior of 0.996 really means?
Bear #2: There is no unanimity as to the definition of objective Bayesian analysis, nor even unanimity as to its goal. It is a quantitative construct arrived at by means of a Bayesian computation based on a prior distribution.
Bear #1: But I am assured the priors are coherent, and do not violate probability axioms, correct?
Bear #2: Not in general. Conventional priors may not even be probabilities in that a constant or flat prior for a parameter may not sum to 1 (improper prior).
Bear #1: If priors are not probabilities, how do I know the posterior is a probability?
Bear #2: The posterior distribution can generally be justified as limiting approximations to proper prior posteriors.
Bear #1: Yeah right. Well the important thing is that this is stronger evidence of a genuine effect than was reported in the recent Hands-Price study: they had only a .965 posterior probability.
Bear #2: Not necessarily. I would have to know their sample size, type of prior used, whether they were doing a Bayesian highest probability density interval or treating it as a test, possibly with a “spiked” prior.
Bear #1: You are not serious.
Bear #2: Unfortunately I’m very serious. Bayesian analyses are like that.
Bear #1: Aren’t all the objective, default priors agreed upon conventions?
Bear #2: Not at all. For instance, one school defines the prior via the (asymptotic) model-averaged information difference between the prior and the posterior; by contrast, the matching prior approach seeks priors that yield optimal frequentist confidence sets for the given model, and there are also model-dependent invariance approaches. Even within a given approach the prior for a particular parameter may depend on whether it is a parameter “of interest” or if it is a nuisance parameter, and even on the “order of importance” in which nuisance parameters are arranged.
Bear #1: Wait a tick: we have a higher posterior probability than the Hands-Price study and you’re saying we might not have stronger evidence?
Bear #2: Yes. Even if you’re both doing the same kind of default Bayesian analysis, the two studies may have started with different choices of priors.
Bear #1: But even the two studies had started with different priors, that difference would have been swamped out by the data, right?
Bear #2: Not necessarily. It will depend on how extreme the priors are relative to the amount of data collected, among many other things.
Bear #1: What good is that? Please assure me at least that if I report a high posterior probability in the results being genuine there is no way it is the result of such shenanigans as hunting and searching until obtaining such an impressive effect.
Bear #2: I’m afraid I can’t, the effect of optional stopping is not generally regarded as influencing the Bayesian computation; this is called the Stopping rule Principle.
Bear #1: You’re not serious.
Bear #2: I am very serious. Granted, stopping rules can be taken account of in a prior, but then the result is Bayesian incoherent (in violating the likelihood principle), but there is no unanimity on this among Bayesian statisticians at the moment. It is a matter of theoretical research
Bear #1: Just to try this one last time: can you tell me how to interpret the reported posterior of .996?
Bear #2: The default posteriors are numerical constructs arrived at by means of conventional computations based on a prior which may in some sense be regarded as either primitive or as selected by a combination of pragmatic considerations and background knowledge, together with mathematical likelihoods given by a stipulated statistical model. The interpretation of the posterior probability will depend on the interpretation of the prior that went into the computation, and the priors are to be construed as conventions for obtaining the default posteriors.
Bear #1: I have had enough please go away now.
*E.R.R.O.R. fund will support the hiring out of the bears or preferable the much better animated entities on xtranormal.
“Just to try this one last time: can you tell me how to interpret the reported posterior of .996?”
It means that for every possible “state of the world” that isn’t ruled out by my knowledge in which there is “no difference”, there are 249 possible states compatible with my knowledge in which there is a difference.
This has an objective, well defined, and useful meaning. It is still meaningful if the “difference” in question is a singular event and can never be used in a “repeated trial”.
It does not however imply that if you could repeat this 250 times that you’d get 1 “no” for every 249 “yes’s”. What would actually happen in a repeated trial is an entirely different question which may or may not be related and relevant.
Thinking of every “.996 probability” as a frequency in a repeated trial is a kind of “scaffolding” surrounding the above that meaning. It’s inaccurate in general, unnecessary even in problems that do address repeated trials, and highly limiting since it only really ever applies to a tiny subset of the problems practitioners actually face.