This continues yesterday’s post: I checked out the the” xtranormal” http://www.xtranormal.com/ website. Turns out there are other figures aside from the bears that one may hire out, but they pronounce “Bayesian” as an unrecognizable, foreign-sounding word with around five syllables. Anyway, before taking the plunge, here is my first attempt, just off the top of my head. Please send corrections and additions.

*Bear #1:* Do you have the results of the study?

Bear #2:Yes. The good news is there is a .996 probability of a positive difference in the main comparison.

Bear #1: Great. So I can be well assured that there is just a .004 probability that such positive results would occur if they were merely due to chance.

*Bear #2:* Not really, that would be an incorrect interpretation.

*Bear #1:* Oh. I see. Then you must mean 99.6% of the time a smaller difference would have been observed if in fact the null hypothesis of “no effect” was true.

*Bear #2:* No, that would also be an incorrect interpretation.

*Bear #1:* Well then you must be saying it is rational to believe to degree .996 that there is a real difference?

*Bear #2:* It depends. That might be so if the prior probability distribution was a proper probabilistic distribution representing rational beliefs in the different possible parameter values independent of the data.

*Bear #1:* But I was assured that this would be a nonsubjective Bayesian analysis.

*Bear #2:* Yes, the prior would at most have had the more important parameters elicited from experts in the field, the remainder being a product of one of the default or conjugate priors.

*Bear #1:* Well which one was used in this study?

*Bear #2: *I would need to find out, I came into the project at the point of trying to find an adequate statistical model; this alone required six different adjustments of the model.

*Bear #1:* So can you explain to me what a posterior of 0.996 really means?

*Bear #2: *There is no unanimity as to the definition of objective Bayesian analysis, nor even unanimity as to its goal. It is a quantitative construct arrived at by means of a Bayesian computation based on a prior distribution.

*Bear #1:* But I am assured the priors are coherent, and do not violate probability axioms, correct?

*Bear #2: *Not in general. Conventional priors may not even be probabilities in that a constant or flat prior for a parameter may not sum to 1 (improper prior).

*Bear #1:* If priors are not probabilities, how do I know the posterior is a probability?

*Bear #2: *The posterior distribution can generally be justified as limiting approximations to proper prior posteriors.

*Bear #1:* Yeah right. Well the important thing is that this is stronger evidence of a genuine effect than was reported in the recent Hands-Price study: they had only a .965 posterior probability.

*Bear #2: *Not necessarily. I would have to know their sample size, type of prior used, whether they were doing a Bayesian highest probability density interval or treating it as a test, possibly with a “spiked” prior.

*Bear #1:* You are not serious.

*Bear #2: *Unfortunately I’m very serious. Bayesian analyses are like that.

*Bear #1:* Aren’t all the objective, default priors agreed upon conventions?

*Bear #2: *Not at all. For instance, one school defines the prior via the (asymptotic) model-averaged information difference between the prior and the posterior; by contrast, the matching prior approach seeks priors that yield optimal frequentist confidence sets for the given model, and there are also model-dependent invariance approaches. Even within a given approach the prior for a particular parameter may depend on whether it is a parameter “of interest” or if it is a nuisance parameter, and even on the “order of importance” in which nuisance parameters are arranged.

*Bear #1:* Wait a tick: we have a higher posterior probability than the Hands-Price study and you’re saying we might not have stronger evidence?

*Bear #2: *Yes. Even if you’re both doing the same kind of default Bayesian analysis, the two studies may have started with different choices of priors.

*Bear #1:* But even the two studies had started with different priors, that difference would have been swamped out by the data, right?

*Bear #2: *Not necessarily. It will depend on how extreme the priors are relative to the amount of data collected, among many other things.

*Bear #1:* What good is that? Please assure me at least that if I report a high posterior probability in the results being genuine there is no way it is the result of such shenanigans as hunting and searching until obtaining such an impressive effect.

*Bear #2: *I’m afraid I can’t, the effect of optional stopping is not generally regarded as influencing the Bayesian computation; this is called the Stopping rule Principle.

*Bear #1:* You’re not serious.

*Bear #2: *I am very serious. Granted, stopping rules can be taken account of in a prior, but then the result is Bayesian incoherent (in violating the likelihood principle), but there is no unanimity on this among Bayesian statisticians at the moment. It is a matter of theoretical research

*Bear #1:* Just to try this one last time: can you tell me how to interpret the reported posterior of .996.

*Bear #2: *The default posteriors are numerical constructs arrived at by means of conventional computations based on a prior which may in some sense be regarded as either primitive or as selected by a combination of pragmatic considerations and background knowledge, together with mathematical likelihoods given by a stipulated statistical model. The interpretation of the posterior probability will depend on the interpretation of the prior that went into the computation, and the priors are to be construed as conventions for obtaining the default posteriors.

*Bear #1:* I have had enough please go away now

It’s a reasonable criticism, but this line of attack of attempting to ask whether a method is a convenient numerical construct or has a defensible “real-world” meaning can easily end up throwing the entirety of statistics out the window, not just bayesian statistics. You can reach similarly absurdist kind of endpoints with 95% confidence intervals or null hypotheses. One might even say that all of statistics is a set of incorrect mental constructs that are hopefully useful.

rv: I disagree in the most vigorous way. I think the question of the relation to real world checks is THE most important question. Fortunately, good methods need only link up to the real world on very specific, checkable properties, e.g., error probabilities, to evade the “let’s-throw-up-our-hands-because-nothing-is 100%-accurate” mindset. It reminds me that I’ll soon allude to E.S. Pearson’s “statistical concepts in their relation to reality,” as his birthday is coming up.

By the way, are the alleged “absurdist” endpoints those “trivial” intervals again? We’ve dealt with that many times on this blog.

Where did I say “trivial”?

I only started reading your blog recently, so sure, maybe there’s something for me to learn that you addressed before. I thought learning through engagement is kind of the purpose of having blog comments.

There’s that quote that everyone attributes to Box about models, which is basically the point I’m making. I don’t think it’s too controversial to say that all models are wrong, including all null models. So if I’m doing NHST, I’m calculating a test statistic with respect to a model that I know to be wrong. Likewise, if I’m calculating a bootstrap estimate of a confidence interval, it’s estimated by resampling in a way that’s not correct in reality. I don’t see these as fundamental flaws in using those methods, it’s just that if one is going to be a purist about the connection to reality, then these are absurd things to be calculating. However, from the standpoint of being useful, they are definitely useful “numerical constructs” for guiding decisions even if their relation to reality is imperfect.

RV: I’m no purist: from approximate models all kinds of true (and false) things are learned. With “trivial” I thought you were referring to cases that wind up definitely containing the true parameter value even though the confidence level is less than one. These are often called “trivial intervals”. If not, than what is the “absurdist” cases you had in mind?

When it comes to things Box wrote when waxing philosophical,quite aside from his statistical work, I’m sorry, but his “everything is subjective” and “all models are false” are quite misleading. I can be very specific in my charge here.

I don’t mind reviewing things on the blog, but it is all to the good to try the search as well. In fact I may return to some older posts deliberately, for this reason, now that this blog is nearing its one-year birthday. This is, after all, philosophy, and I’ve always found it valuable to ponder things many times over.

I’ve heard the word “Bayesian” pronounced two different ways. You can get xtranormal to produce something reasonable for both pronunciations with (i) “beige’an”, (ii) “bays’ee’n”.

Bear #1 sounds more like a frequentist statistician than any scientist I know…

“Bear #1: Well which one was used in this study?

Bear #2: I would need to find out”

!!!

Fire this bear immediately — he didn’t do his job.

“Bear #1: Please assure me at least that if I report a high posterior probability in the results being genuine there is no way it is the result of such shenanigans as hunting and searching until obtaining such an impressive effect.

Corey Bear: I’d recommend just looking at the estimated effect size. Hunting and searching can’t mislead you about that.”

Corey: I didn’t have the transcript, so I was working from memory. I recalled the brown bear saying he’d have to check, but now I see that referred to the other study which he seemed to know had a larger sample size.

But would the Bayesian bear #2 necessarily know what had been measured by a prior that seems so open and variable in its meaning? Can you recommend a replacement?–as I said, this was just an off-the-cuff little exercise (I was mostly getting ideas about elicitation from J. Berger).

“Bear #1 sounds more like a frequentist statistician than any scientist I know.”

Really? Well bear #1 in the “what the p-value” video sounded Bayesian, and certainly was not informed about interpreting significance tests.

I only looked at the xtranormal site for a short time, and didn’t notice a way to adjust the pronounciation, although I’ve seen such a thing in programs from long ago. If they do have that feature, then why did “what the p” have the ridiculous pronounciation of “ka ching!”?

I’d need to work that into my clip if I actually make one, and of course all used suggestions would be acknowledged.

Corey: I just noticed your use of “Corey Bear”–cute.

“I’d recommend just looking at the estimated effect size. Hunting and searching can’t mislead you about that.”

Seems it would depend on how it is done. The Bayesian HPD in optional stopping allows the null value to be excluded from the interval with probability 1, even if it that value is true.

Mayo: I was a bit alarmed when your critiques brought this fact to my attention (for which I thank you), and I gave it some thought. I stopped worrying about it when I noticed that this is one of those “sets of measure zero” phenomena that occasionally crop up. The null value is the only one that can be excluded with sampling probability 1, and it only happens for priors that don’t have a point mass on the null value. So this is not too alarming — if such a prior distribution is actually a good summary of the available prior information, it would be a *remarkable* coincidence (i.e., event of prior probability 0) if the boundary for optional stopping happened to coincide with the true value.

Corey: Huh? I don’t get your point. Excluding the true value can readily occur with optional stopping. See on my publication page, EGEK Chapter 10 Why You Cannot Be Just a Little Bit Bayesian, 350-5, beginning with “Bayesian Freedom, Bayesian Magic.”

http://www.phil.vt.edu/dmayo/personal_website/bibliography%20complete.htm

If your point is that the subjectivist does not believe he’s being misled, even though he is–essentially Berger and Wolpert’s point here– then that seems very unsatisfactory.

My point is — how can I put this? Hmm… I guess it comes down to considering the substantive reasons why a given parameter value is considered the null hypothesis.

In some cases, prior knowledge picks out a certain parameter value. For example, in some physics models, some specific parameter value corresponds to some simplification or symmetry. Extensions/generalizations of General Relativity often look like this. In such cases, I hold that one ought to use a prior that puts a point mass on the “special” value on algorithmic complexity grounds. And when there is a point mass on the null, optional stopping cannot be used to exclude a true null with sampling probability 1.

In other cases, we don’t expect the null to hold strictly; we care more about the magnitude and sign of the discrepancy from the null. Social science and biology provide the examples here. In a pharmaceutical clinical trial, it would be a fantastic coincidence if a new treatment happened to have *exactly* the same measured effect as the control treatment. Gene knockout experiments are like this too. In these contexts, to conceive of the null hypothesis *actually* holding is to reify the model in a way that cannot be justified scientifically. In practical terms, we’re not concerned about using optional stopping to exclude the null value from some posterior credible interval with sampling probability 1. Rather, we’re concerned about what Gelman calls Type M and Type S error.

Sorry for the wall of text. My paragraph breaks aren’t showing up for some reason.

Ok, well played…

… but to echo Corey’s point isn’t this bad frequentist stats vs bad Bayesian stats, i.e. isn’t the best thing to do to look at the effect size?

Also, to get beyond the preachiness that is sometimes a characteristic of stats if the bears are having a constructive dialogue how does this go?

David: Well, of course my response grew out of the large number of efforts along the lines of “what the p-value”, so as to have just ONE example (are there others?) that parodies and provokes on the side of error statistical reasoning, just for once. Aside from that little exercise, I would agree that constructive dialogue would be preferable.

I definitely think the bear #2 from “what the p?” could easily have gone on to report the effect sizes or discrepancies that are/are not warranted with severity. In fact, now that you mention it, it would be fun to have bear #3 come onto the scene of “what the p?” and straighten them out. Like when bear #1 is horrified, ..horrified, that the other study might have a different sample size and so—heaven’s to mergatroyde —he’d actually have to use an effect size appraisal that took sample size into account!

But seriously, I’d really like to know what a constructive Bayesian critique (of mine) would look like.

(All of this started when I mentioned to Nathan Schachtman the other night, on e-mail, that I’d watched a Chiller movie called “Bear”* wherein all but one woman, of two couples whose car broke down, are mauled by a single bear, over several hours of failing to outsmart it. This evidently reminded him of a bear video he made about statistics in the law…I should post it, it’s very good, I think.)

*The reason I mentioned it is that he said he was in a cabin in the Canadian woods at the time.

Deborah:

Here is my film production of the “Dow-Bears,” a 90 second spoof on how many judges persuade themselves that anything goes in expert witness gatekeeping. The typical context is a health claim made on a few, inconsistent observational epidemiologic studies.

Nathan

Nathan: The “Dow-Bears” always remind me of those of us who are skeptical of surviving stock market gyrations.

Your clip says “removed by user”, let’s see if the one you sent the other day will work. Mayo

No, I guess not. Fire that bear (as Corey would say)!

I don’t know why the URL was removed. I pasted it in, but when I went to view the reply as posted, all I saw was “removed by user.” I will post the URL on my website at some point.

These posts game an idea… wuha.. wuhaha… wuhahaha… 😉

game = gave me

What?

I fast typed “game” when I meant “gave me”; replies can’t be edited in WP.

I wish WP though would give us at least 5 minutes edit time… well, for my wish-list.