Oh No! It’s those mutant bears again. To my dismay, I’ve been sent, for the *third* time, that silly, snarky, adolescent, clip of those naughty “what the p-value” bears (first posted on Aug 5, 2012), who cannot seem to get a proper understanding of significance tests into their little bear brains. So apparently some people haven’t seen my rejoinder which, as I said then, practically wrote itself. So since it’s Saturday night here at the Elbar Room, let’s listen in to a mashup of both the clip and my original rejoinder (in which p-value bears are replaced with hypothetical Bayesian bears).* *

These stilted bear figures and their voices are sufficiently obnoxious in their own right, even without the tedious lampooning of p-values and the feigned horror at learning they should not be reported as posterior probabilities.

*Mayo’s Rejoinder:*

*Bear #1:* Do you have the results of the study?

Bear #2:Yes. The good news is there is a .996 probability of a positive difference in the main comparison.

Bear #1: Great. So I can be well assured that there is just a .004 probability that such positive results would occur if they were merely due to chance.

*Bear #2:* Not really, that would be an incorrect interpretation.

*Bear #1:* Oh. I see. Then you must mean 99.6% of the time a smaller difference would have been observed if in fact the null hypothesis of “no effect” was true.

*Bear #2:* No, that would also be an incorrect interpretation.

*Bear #1:* Well then you must be saying it is rational to believe to degree .996 that there is a real difference?

*Bear #2:* It depends. That might be so if the prior probability distribution was a proper probabilistic distribution representing rational beliefs in the different possible parameter values independent of the data.

*Bear #1:* But I was assured that this would be a nonsubjective Bayesian analysis.

*Bear #2:* Yes, the prior would at most have had the more important parameters elicited from experts in the field, the remainder being a product of one of the default or conjugate priors.

*Bear #1:* Well which one was used in this study?

*Bear #2: *I would need to find out, I came into the project at the point of trying to find an adequate statistical model; this alone required six different adjustments of the model.

*Bear #1:* So can you explain to me what a posterior of 0.996 really means?

*Bear #2: *There is no unanimity as to the definition of objective Bayesian analysis, nor even unanimity as to its goal. It is a quantitative construct arrived at by means of a Bayesian computation based on a prior distribution.

*Bear #1:* But I am assured the priors are coherent, and do not violate probability axioms, correct?

*Bear #2: *Not in general. Conventional priors may not even be probabilities in that a constant or flat prior for a parameter may not sum to 1 (improper prior).

*Bear #1:* If priors are not probabilities, how do I know the posterior is a probability?

*Bear #2: *The posterior distribution can generally be justified as limiting approximations to proper prior posteriors.

*Bear #1:* Yeah right. Well the important thing is that this is stronger evidence of a genuine effect than was reported in the recent Hands-Price study: they had only a .965 posterior probability.

*Bear #2: *Not necessarily. I would have to know their sample size, type of prior used, whether they were doing a Bayesian highest probability density interval or treating it as a test, possibly with a “spiked” prior.

*Bear #1:* You are not serious.

*Bear #2: *Unfortunately I’m very serious. Bayesian analyses are like that.

*Bear #1:* Aren’t all the objective, default priors agreed upon conventions?

*Bear #2: *Not at all. For instance, one school defines the prior via the (asymptotic) model-averaged information difference between the prior and the posterior; by contrast, the matching prior approach seeks priors that yield optimal frequentist confidence sets for the given model, and there are also model-dependent invariance approaches. Even within a given approach the prior for a particular parameter may depend on whether it is a parameter “of interest” or if it is a nuisance parameter, and even on the “order of importance” in which nuisance parameters are arranged.

*Bear #1:* Wait a tick: we have a higher posterior probability than the Hands-Price study and you’re saying we might not have stronger evidence?

*Bear #2: *Yes. Even if you’re both doing the same kind of default Bayesian analysis, the two studies may have started with different choices of priors.

*Bear #1:* But even if the two studies had started with different priors, that difference would have been swamped out by the data, right?

*Bear #2: *Not necessarily. It will depend on how extreme the priors are relative to the amount of data collected, among many other things.

*Bear #1:* What good is that? Please assure me at least that if I report a high posterior probability in the results being genuine there is no way it is the result of such shenanigans as hunting and searching until obtaining such an impressive effect.

*Bear #2: *I’m afraid I can’t, the effect of optional stopping is not generally regarded as influencing the Bayesian computation; this is called the Stopping rule Principle.

*Bear #1:* You’re not serious.

*Bear #2: *I am very serious. Granted, stopping rules can be taken account of in a prior, but then the result is Bayesian incoherent (in violating the likelihood principle), but there is no unanimity on this among Bayesian statisticians at the moment. It is a matter of theoretical research

*Bear #1:* Just to try this one last time: can you tell me how to interpret the reported posterior of .996.

*Bear #2: *The default posteriors are numerical constructs arrived at by means of conventional computations based on a prior which may in some sense be regarded as either primitive or as selected by a combination of pragmatic considerations and background knowledge, together with mathematical likelihoods given by a stipulated statistical model. The interpretation of the posterior probability will depend on the interpretation of the prior that went into the computation, and the priors are to be construed as conventions for obtaining the default posteriors.

*Bear #1:* I have had enough please go away now

*****************************************

The presumption of their clip is that somehow there would be no questions or confusion of interpretation were the output in the form of a posterior probability. The problem of indicating the extent of discrepancies that are/are not warranted by a given p-value is genuine but easy enough to solve*. **What I never understand is why it is presupposed that the most natural and unequivocal way to interpret and communicate evidence (in this case, leading to low p-values) is by means of a (posterior) probability assignment, when it seems clear that the more relevant question that the testy-voiced (“just wait a tick”) bear would put to the know-it-all bear would be: how often would this method have erroneously declared a genuine discrepancy?**

I once checked out the”xtranormal” http://www.xtranormal.com/ website. Turns out there are other figures aside from the bears that one may hire out, but they pronounce “Bayesian” as an unrecognizable, foreign-sounding word with around five syllables. My idea for the rejoinder was never polished up for actually making a clip. Doing so would appeal to my artist side, which means I would get too involved. Anyone who wants to take the plunge, let me know. In fact the original post had 16 comments where several reader improvements were suggested. Please send corrections and additions.

Reference: Blume, J. and J. F. Peipert (2003). “What your statistician never told you about P-values.” J Am Assoc Gynecol Laparosc 10(4): 439-444.

*See for example, Mayo & Spanos (2011) ERROR STATISTICS

I’m thinking I should change the first reaction of the bear #2 as follows:

Bear #1: Great. So I can be well assured that there is just a .004 probability that such positive results would occur if they were merely due to chance.

Bear #2: Not really, that would be an incorrect interpretation. Your interpretation is what would be given by a P-value, but you won’t get that from the posterior (except in certain special cases where they match). Some people refer to this as “transposing” the conditional, although it’s not quite that either.

What do you think?

Very interesting! How often do I hear a P-value is inferior to a posterior probability, as that’s the only thing that matters? all the time. The little secret that hoodwinks those who forget to look under the hood to check: what does your posterior probability mean? Nobody can say. They don’t even agree with defaults priors. P-values don’t change their meaning. They can be misused, we know, particularly if interpreted as posterior probabilities or with mining, dredging and scalping, but we are not lost in a flux of fuzzy froth numbers likely to undergo chrysalis transformations on a moment’s whim. I’m sure most people using Bayesian software don’t know what’s coming out of the machine, they’re buying a pig in a poke.

“The little secret that hoodwinks those who forget to look under the hood to check: what does your posterior probability mean? Nobody can say.”

Subjective Bayesians would say, “It means that if one held such-and-such probability distribution* prior to considering the evidence, one must now have so-and-so probability distribution thereafter.” Other Bayesian schools of thought have supplied answers too. What kind of answer are you looking for/have you looked for? In what way do the answers you’ve seen, such as the above subjective Bayesian one, fail to supply the requested meaning?

(The school of thought to which I subscribe would agree with the subjective Bayesian notion operationally but would not find it to be the fundamental motivation.)

* Subjective Bayesians cash out “holding a probability distribution” as “if someone held a gun to your head and forced you to make book on some set of events without letting you choose which side of the bet said gun-wielder would take, and you were smart enough to choose odds that avoided diachronic Dutch books, then your odds would encode a probability distribution you can be said to hold.”

Corey: But Bayesians have given up on Dutch book justifications. http://errorstatistics.com/2012/04/15/3376/

And even if we were to allow that this researcher would bet in accord with the .996 posterior computed, we still don’t know why that constitutes a strong warrant for the inference (or better warrant than the Hands Price study). I can see adding some rounds of dialogue where the know-it-all bear tells the other that this is how he must/would bet if there’s a gun to his head. Never mind the bet would not be settled. In the next round of dialogue, maybe he’d be offered the chance to change his prior instead…

i was imagining the scientist asked for a non-subjective Bayesian assessment. But regardless, what would be wanted is an assessment of whether less impressive scores, be they posteriors or likelihoods, would very rarely occur if the null is true or approximately true. If, for example, the questioning bear found that even higher posteriors would very readily result due to chance variability alone, he wouldn’t bother much about the meaning of the .996 in H* (whatever real effect is being inferred) He’d discount the .996 in H* as a very poor indicator for H*.

Mayo: Perhaps subjective Bayesian philosophers have given up on Dutch book arguments — I wouldn’t know about that. I was talking about subjective Bayesian statisticians, like Jay Kadane f’r instance.

We can (and have) gone ’round and ’round on what posteriors mean and what is really wanted; I don’t think either of us have anything new to say at this point. My comment was aimed at the false notion that nobody can say what the posterior means. Maybe what we Bayesians say is fallacious, but it’s simply not the case that we have no answer.

Anon: It’s amazing how people derogate the error statistician’s way of qualifying approximate claims–which is actually wholly in sync with non-statistical science and ordinary day-to-day reasoning– without a clue as to what they’re actually getting from one of the many Bayesian computations.

Just as an aside: I just came across a review of EGEK 1996 which, while critical, gets one thing right:

“Mayo shows that much of the disagreement stems from differing goals. ES [error statistics] is designed for scientists whose main job is generating better evidence via new experiments. ES uses statistics for generating severe tests of hypotheses. Bayesians take evidence as a given and don’t think experiments deserve special status within probability theory.”

It’s from a blog called Bayesian Investor:

http://www.bayesianinvestor.com/blog/index.php/2013/02/08/error-statistics/

May I just say that I continue to be aggrieved that your response is aimed at Bayesians when the criticism to which you are replying comes from likelihoodlums. :-p

Corey: I didn’t know you were aggrieved about this, much less that you continue to be. Do you mean the guys who made the clip are likelihoodlums? I know nothing about its origins. Since the dialogue included misinterpretations of p-values as posteriors, it seemed that a reversal of roles would be to imagine a posterior was reported, and have the questioning bear discover there wasn’t a clue as to why such a number, as actually computed by the recommended methods, was desirable after all.

Unless perhaps it is shown to satisfy the low error probability afforded by the original low p-value, when correctly interpreted. In that case, I see the questioning bear giving a huge sigh of relief, and never again questioning the value of his error statistical tools. They all live happily ever after.

Cha-ching!

Mayo: You do know something about its origins: you know that it’s loosely based on the intro to the Blume and Peipert paper (or so I infer because that’s what it says in the video description, and the paper isn’t referenced in the video itself). As I noted when you were first putting your response together, Blume is a likelihoodist.

(As for the guy who made the clip, his publication record is filled with applied papers, so I can’t easily suss out his philosophical leanings.)

Nicely done. I would have been quicker—

Bear 2: Am I to understand that the posterior probability that the null hypothesis is false has something to do with it being false?

Bear 1: That would be an incorrect interpretation.

Kevin Kelley: the posterior probability inherits asymptotic consistency from the Bayes factor. What have you got against asymptotics?

touché monsieur!