The search for an agreement on numbers across different statistical philosophies is an understandable pastime in foundations of statistics. Perhaps identifying matching or unified numbers, apart from what they might mean, would offer a glimpse as to shared underlying goals? Jim Berger (2003) assures us there is no sacrilege in agreeing on methodology without philosophy, claiming “while the debate over interpretation can be strident, statistical practice is little affected as long as the reported numbers are the same” (Berger, 2003, p. 1).

**Do readers agree?**

Neyman and Pearson (or perhaps it was mostly Neyman) set out to determine when tests of statistical hypotheses may be considered “independent of probabilities a priori” ([p. 201). In such cases, frequentist and Bayesian may agree on a critical or rejection region.

The agreement between “default” Bayesians and frequentists in the case of one-sided Normal (IID) testing (known σ) is very familiar. As noted in Ghosh, Delampady, and Samanta (2006, p. 35), if we wish to reject a null value when “the posterior odds against it are 19:1 or more, i.e., if posterior probability of *H*_{0} is < .05” then the rejection region matches that of the corresponding test of *H*_{0}, (at the .05 level) if that were the null hypothesis. By contrast, they go on to note the also familiar fact that there would be disagreement between the frequentist and Bayesian if one were instead testing the two sided: *H*_{0}: μ=μ_{0} vs. *H*_{1}: μ≠μ_{0} with known σ. In fact, the same outcome that would be regarded as evidence against the null in the one-sided test (for the default Bayesian and frequentist) can result in statistically significant results being construed as no evidence against the null *—for the Bayesian–* or even evidence *for* it (due to a spiked prior).[i]

J. A. Hartigan (1971), commenting on David Bartholomew, gives a 5 line argument that while Bayes and frequency intervals may sometimes agree with improper priors, they never exactly agree with proper priors (see below [ii]). But improper priors are not considered to provide degrees of belief (not even being proper probabilities). *This would seem to suggest that when they (frequentists and Bayesians) agree on numbers, the prior cannot be construed as a proper degree of belief assignment.*

**What say you?**

Berger, J. (2003),“Could Fisher,Jeffreys and Neyman Have Agreed on Testing?”, *Statistical Science* 18, 1–12.

Neyman and Pearson, “The Testing of Statistical Hypotheses in Relation to Probabilities a priori”, *Joint Statistical Papers of Neyman and Pearson*, 186-202.

Bartholomew, D. J., “A comparison of Frequentist and Bayesian Approaches to Inferences With Prior Knowledge,” in Godambe and Sprott, (1971), *Foundations of Statistical Inference*, 417-429.

Ghosh, Delampady, and Samanta (2006), *An Introduction to Bayesian Analysis, Theory and Methods,* Springer.

Mayo, D. G. (2003), Comment on J. O. Berger’s “Could Fisher,Jeffreys and Neyman Have Agreed on Testing?”, *Statistical Science* 18, 19-24.

[i] But not all default Bayesians endorse the spiked priors here, meaning there a lack of agreement on numbers even within the same philosophical school.

[ii] Here are the 5 lines:

We need P(θ < θ_{α}(x)| θ) = α all θ.

Assume θ_{α}(x) has positive density over line, all θ.

Then P(θ < θ_{α}(x)| θ, θ_{α}(x) >0) = α(θ) > α.

So P(θ < θ_{α}(x)| θ_{α}(x) > 0) > α averaging over θ.

So P(θ < θ_{α}(x)| x) = α is impossible.

(J.A. Hartigan, comment on D. J. Bartholomew (1971), “Comparison of Frequentist and Bayesian Approaches to Inference with Prior Knowledge”, in Godambe and Sprott (eds.), *Foundations of Statistical Inference, p.432)*

I had to clarify a couple of referents in this post: in the last sentence of each of the last 2 paras.

Your post seems to say that the Bayesian and frequentist could both agree that a result rejects the null in a one-sided test (such as Ho: m 0), but disagree on the same result in the two-sided test (m = 0 versus m not equal to 0)? But rejecting the one-sided null gives accept m > 0. How can a Bayesian believe m > 0 to a high degree and not believe that either m is bigger than or smaller than 0 to a high degree?

@Eileen, for the one-sided problem, Bayesian posterior probabilities in support of the null and frequentist inference using p-values can agree very broadly – see G Casella and R Berger (1987, JASA). The reconciliation requires taking an infimum over a class of priors.

In the two-sided problem, the default approaches under the two paradigms don’t agree; for example, under the null, Bayesian support rapidly concentrates at the null, as we accrue data, whereas the default frequentist always rejects the null with probability alpha.

Eileen: That’s a good question, and the answer is basically the difference in the prior probability to the null in the latter case…dashing out now…my next post will go further on this case.

by the way, haven’t heard from you in awhile…more soon.

Yes, I see a guest has replied, thanks! It is odd they would be prepared to essentially violate the “consequence condition” (with the 2-sided default Bayes test). Casella and R. Berger* have an excellent paper!

*It can be confusing here because Roger Berger and Jim Berger are involved in this debate, on opposite sides. I will note some other consequences in my next…

Well, I’d say that we shouldn’t just report numbers and ignore their meaning. If two numbers mean different things, nothing is gained if they are about the same. I’m not sure whether anybody except of Berger would advertise to forget about understanding what the numbers precisely mean… (probably the practice of quite a few statisticians suggests that they agree with Berger but they would rarely admit it).

A more interesting question would be to what extent substantial interpretations are the same from different analyses, based on (potentially similar) numbers with different meanings. One may find good agreement between high quality Bayesian and frequentist data analyses in this respect, but this shouldn’t be boiled down to the numbers alone.

Hennig: Thanks. You may be right that J. Berger is one of the few to say this out loud, but I find many others who quite earnestly and circuitously work to claim shared meanings for similar numbers between Bayesian and frequentist methods. I happen to just be rereading Kass’s interesting article (Statistical Science, Feb 2011) where he struggles mightily to try and explain that the confidence level of a frequentist interval estimate, not estimator, kind-of-sort-of-if you-wink-one-eye means the same thing as a corresponding Bayesian probability to the estimate (4) [not that I’m clear on what the latter really means]. I trace this to a presupposition that whenever we assert that data x is good evidence for a claim about some aspect of an underling process or entity, we must take the additional step of assigning a high probability to the assertion that is warranted. This I find at odds with assertions in science and day-to-day life (which may still be qualified as approximate). It is related to the idea that the truth of a scientific hypothesis is like the occurrence of an outcome of a trial (or a ‘unique event’). I am writing on this right now…or should be rather than blogging…