The search for an agreement on numbers across different statistical philosophies is an understandable pastime in foundations of statistics. Perhaps identifying matching or unified numbers, apart from what they might mean, would offer a glimpse as to shared underlying goals? Jim Berger (2003) assures us there is no sacrilege in agreeing on methodology without philosophy, claiming “while the debate over interpretation can be strident, statistical practice is little affected as long as the reported numbers are the same” (Berger, 2003, p. 1).
Do readers agree?
Neyman and Pearson (or perhaps it was mostly Neyman) set out to determine when tests of statistical hypotheses may be considered “independent of probabilities a priori” ([p. 201). In such cases, frequentist and Bayesian may agree on a critical or rejection region.
The agreement between “default” Bayesians and frequentists in the case of one-sided Normal (IID) testing (known σ) is very familiar. As noted in Ghosh, Delampady, and Samanta (2006, p. 35), if we wish to reject a null value when “the posterior odds against it are 19:1 or more, i.e., if posterior probability of H0 is < .05” then the rejection region matches that of the corresponding test of H0, (at the .05 level) if that were the null hypothesis. By contrast, they go on to note the also familiar fact that there would be disagreement between the frequentist and Bayesian if one were instead testing the two sided: H0: μ=μ0 vs. H1: μ≠μ0 with known σ. In fact, the same outcome that would be regarded as evidence against the null in the one-sided test (for the default Bayesian and frequentist) can result in statistically significant results being construed as no evidence against the null —for the Bayesian– or even evidence for it (due to a spiked prior).[i]
J. A. Hartigan (1971), commenting on David Bartholomew, gives a 5 line argument that while Bayes and frequency intervals may sometimes agree with improper priors, they never exactly agree with proper priors (see below [ii]). But improper priors are not considered to provide degrees of belief (not even being proper probabilities). This would seem to suggest that when they (frequentists and Bayesians) agree on numbers, the prior cannot be construed as a proper degree of belief assignment.
What say you?
Berger, J. (2003), “Could Fisher,Jeffreys and Neyman Have Agreed on Testing?”, Statistical Science 18, 1–12.
Neyman and Pearson, “The Testing of Statistical Hypotheses in Relation to Probabilities a priori”, Joint Statistical Papers of Neyman and Pearson, 186-202.
Bartholomew, D. J., “A comparison of Frequentist and Bayesian Approaches to Inferences With Prior Knowledge,” in Godambe and Sprott, (1971), Foundations of Statistical Inference, 417-429.
Ghosh, Delampady, and Samanta (2006), An Introduction to Bayesian Analysis, Theory and Methods, Springer.
Mayo, D. G. (2003), Comment on J. O. Berger’s “Could Fisher,Jeffreys and Neyman Have Agreed on Testing?”, Statistical Science 18, 19-24.
[i] But not all default Bayesians endorse the spiked priors here, meaning there a lack of agreement on numbers even within the same philosophical school.
[ii] Here are the 5 lines:
We need P(θ < θα(x)| θ) = α all θ.
Assume θα(x) has positive density over line, all θ.
Then P(θ < θα(x)| θ, θα(x) >0) = α(θ) > α.
So P(θ < θα(x)| θα(x) > 0) > α averaging over θ.
So P(θ < θα(x)| x) = α is impossible.
(J.A. Hartigan, comment on D. J. Bartholomew (1971), “Comparison of Frequentist and Bayesian Approaches to Inference with Prior Knowledge”, in Godambe and Sprott (eds.), Foundations of Statistical Inference, p.432)