Steven McKinney, Ph.D.
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre
On Bradley Efron’s: “Frequentist Accuracy of Bayesian Estimates”
Bradley Efron has produced another fine set of results, yielding a valuable estimate of variability for a Bayesian estimate derived from a Markov Chain Monte Carlo algorithm, in his latest paper “Frequentist accuracy of Bayesian estimates” (J. R. Statist. Soc. B (2015) 77, Part 3, pp. 617–646). I give a general overview of Efron’s brilliance via his Introduction discussion (his words “in double quotes”).
The past two decades have witnessed a greatly increased use of Bayesian techniques in statistical applications. Objective Bayes methods, based on neutral or uniformative priors of the type pioneered by Jeffreys, dominate these applications, carried forward on a wave of popularity for Markov chain Monte Carlo (MCMC) algorithms. Good references include Ghosh (2011), Berger (2006) and Kass and Wasserman (1996).”
A nice concise summary, one that should bring joy to anyone interested in Bayesian methods after all the Bayesian-bashing of the middle 20th century. Efron himself has crafted many beautiful results in the Empirical Bayes arena. He has reviewed important differences between Bayesian and frequentist outcomes that point to some as-yet unsettled issues in statistical theory and philosophy such as his scales of evidence work.
“Suppose then that, having observed data x from a known parametric family fμ(x), I wish to estimate t(μ), a parameter of particular interest. In the absence of relevant prior experience, I assign an uninformative prior π(μ), perhaps from the Jeffreys school. Applying Bayes rule yields , the posterior expectation of t(μ) given x:
A simple setup, that is of interest in many MCMC methods. This issue is certainly relevant in all of the bioinformatics algorithms that use MCMC methods to comb through mountains of genomic data these days, such as those being developed by the wunderkinds here at the British Columbia Cancer Research Centre and elsewhere. Efron’s gift is timely indeed.
“How accurate is ? The obvious answer, and the one that is almost always employed, is to infer the accuracy of according to the Bayes posterior distribution of t(μ) given x. This would obviously be correct if π(μ) were based on genuine past experience. It is not so obvious for uninformative priors. I might very well like as a point estimate, based on considerations of convenience, coherence, smoothness, admissibility or aesthetic Bayesian preference, but not trust what is after all a self-selected choice of prior as determining ’s accuracy. Berger (2006) made this point at the beginning of his section 4.”
Efron demonstrates his sensibilities here. He’s not dogmatic about any particular methodology and has a keen sense for and appreciation of aesthetics. That’s why I’m always excited when Efron produces new findings.
“As an alternative, this paper proposes computing the frequentist accuracy of , i.e. regardless of its Bayesian provenance, we consider simply as a function of the data x and compute its frequentist variability.”
Efron adeptly dances between and within the Bayesian and frequentist ballrooms.
“Our main result, which is presented in Section 2, is a general accuracy formula for the delta method standard deviation of : general in the sense that it applies to all prior distributions, uninformative or not. Even in complicated situations the formula is computationally inexpensive: the same MCMC calculations that give the Bayes estimate also provide its frequentist standard deviation. A lasso-type example is used for illustration. Many of the examples that follow use Jeffreys priors; this is only for simplified exposition and is not a limitation of the theory.”
This is the brilliance here. Efron has identified an extremely useful general accuracy formula, using results readily at hand, detached from priors and all the handwringing that goes on as people argue endlessly about whose prior is better than whose.
The discovery that the very MCMC results that produced the point estimate can also yield an estimate of its variability gives us extremely valuable information at no extra charge. MCMC algorithms do indeed produce useful point estimates, after much computation. So what were previous ways to obtain an estimate of variability? One could bootstrap the process (again, an amazing statistical methodology from the mind of Efron), but this would involve thousands of repeats of the already computationally intensive process. One could assess the variability from the spread of the posterior distribution, but this depends on the prior, and the handwringing begins.
“In fact several of our examples will demonstrate near equality between Bayesian and frequentist standard deviations. That does not have to be so: remark 1 in Section 6 discusses a class of reasonable examples where the frequentist accuracy can be less than half of its Bayesian counterpart. Other examples will calculate frequentist standard deviations for situations where there is no obvious Bayesian counterpart, e.g. for the upper end point of a 95% credible interval.”
Efron demonstrates, once again, how sensible Bayesian and frequentist computations so often yield nearly equal answers. This is why so much of the silly bashing of frequentist results by Bayesian zealots, and Bayesian results by frequentist zealots, is pointless. Mostly, we arrive at the same conclusion regardless of which strategy we employ, and Efron is so adept at developing useful and practical solutions from methods in either camp without being a zealot. In those cases where frequentist and Bayesian solutions differ starkly, we still have something to learn.
“The general accuracy formula takes on a particularly simple form when fμ(x) represents a p-parameter exponential family: Section 3. Exponential family structure also allows us to substitute parametric bootstrap sampling for MCMC calculations, at least for uninformative priors. This has computational advantages. More importantly, it helps to connect Bayesian inference with the seemingly superfrequentist bootstrap world, which is a central theme of this paper.”
More bridge building between the Bayesian and frequentist paradigms. Always useful.
“The general accuracy formula provides frequentist standard deviations for Bayes estimators, but nothing more. Better inferences, in the form of second-order-accurate confidence intervals, are developed in Section 4, again in an exponential family bootstrap context. Section 5 uses the accuracy formula to compare hierarchical and empirical Bayes methods. The paper concludes with remarks, details and extensions in Section 6.”
Amazing. This paper is such a rich vein of statistical mathematical development.
“The frequentist properties of Bayes estimates is a venerable topic, that has been nicely reviewed in chapter 4 of Carlin and Louis (2000). Particular attention focuses on large sample behaviour, where ‘the data swamp the prior’ and converges to the maximum likelihood estimator (see result 8 in section 4.7 of Berger (1985)), in which case the Bayes and frequentist standard deviations are nearly the same. Our accuracy formula provides some information about what happens before the data swamp the prior, though the present paper offers no proof of its superiority to standard asymptotic methods.”
Efron is always objective about judging superiority. He’s clear about the criteria upon which he does base his assessments of superiority when he does so, and it’s never ego-based.
“Some other important Bayesian-cum-frequentist topics are posterior and preposterior model checking as in Little (2006) or chapter 6 of Gelman et al . (1995), Bayesian consistency (Diaconis and Freedman, 1986), confidence matching priors, going back to Welch and Peers (1963), and empirical Bayes analysis as in Morris (1983). Johnstone and Silverman (2004) have provided, among much else, asymptotic bounds for the frequentist accuracy of empirical Bayes estimates.”
More interesting reading to do. Efron’s citations do not include zealots and crackpots, unless he is deconstructing their sillinesses. I appreciate Efron’s vetting of the literature.
“Sensitivity analysis—modifying the prior as a check on the stability of posterior inference—is a staple of Bayesian model selection. The methods of this paper amount to modifying the data as a posterior stability check (see lemma 1 of Section 2). The implied suggestion here is to consider both techniques when the prior is in doubt.”
More useful suggestions about sensitivity analysis methods, always important to assess to ensure results are not potentially misleading.
“The data sets and function freqacc are available from http://statweb.stanford.edu/~ckirby/brad/papers/jrss/.”
Efron is always generous about sharing his insights. His results are not squirreled away behind corporate doors, seeking to make vast profit from a software offering, or a consulting firm.
So, in a nutshell (okay it’s a big coconut shell) that is my summary of the brilliance I see in this paper. A beautiful presentation of a general result that gives very useful information at no extra charge. It was sitting there all along, and Efron saw it, crafted it and presented it to us succinctly, to use forevermore. This theory (Lemma 1 and Theorem 1) will be in future statistics textbooks, alongside many other Efron discoveries. It’s amazing to live during the times of such a statistical genius.
Note from Mayo: This began as a comment by Steven McKinney on a recent post on Efron. He agreed to let me put it up as a guest blog post. (There was also a comment by Greenland.) To galvanize some conversation, let me pose some queries: Is it true that “Mostly, we arrive at the same conclusion regardless of which strategy we employ”? and when we do, do they mean the same things? Should we really be striving for an agreement on numbers without an agreement on interpretation? I’m inclined to favor a philosophy of “different tools for different goals”, lest one side get shortchanged. Notably, there’s an essential difference between how believable or plausible claims are and how well tested or probed they are by dint of a given set of inquiries. Distinct aims occupy “science-wise screening” tasks, as in large-throughput testing of a “diagnostic” type, or jobs with a performance-oriented behavioristic flavor. Typically, when I see “matching numbers” used for different ends, one set of goals and philosophy is compromised and/or misunderstood. (Recall the “P-values exaggerate” meme.) Why not try to truly understand just what each can (and cannot) do?
I completely agree with McKinnney on the brilliance of Efron’s many innovations over the years, particularly in relation to resampling, and concur about the value of checking error statistical properties of other methods, but we’ve also heard Lindley claim “there’s nothing less Bayesian than empirical Bayes” and even “conventional” Bayesians sometimes bristle at being cross-checked by error statistical means. I’m very grateful to McKinney for this guest post!
Berger, J. O. (1985) Statistical Decision Theory and Bayesian Analysis, 2nd edn. New York: Springer.
Berger, J. (2006) The case for objective Bayesian analysis. Baysn Anal., 1, 385–402.
Carlin, B. P. and Louis, T. A. (2000) Bayes and Empirical Bayes Methods for Data Analysis, 2nd edn. Boca Raton: Chapman and Hall–CRC.
Diaconis, P. and Freedman, D. (1986) On the consistency of Bayes estimates (with discussion). Ann. Statist., 14, 1–67.
Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (1995) Bayesian Data Analysis. New York: Chapman and Hall.
Ghosh, M. (2011) Objective priors: an introduction for frequentists (with discussion). Statist. Sci., 26, 187–202.
Johnstone, I. M. and Silverman, B. W. (2004) Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences. Ann. Statist., 32, 1594–1649.
Kass, R. E. and Wasserman, L. (1996) The selection of prior distributions by formal rules. J. Am. Statist. Ass., 91, 1343–1370.
Little, R. J. (2006) Calibrated Bayes: a Bayes/frequentist roadmap. Am. Statistn, 60, 213–223.
Morris, C. N. (1983) Parametric empirical Bayes inference: theory and applications (with discussion). J. Am. Statist. Ass., 78, 47–65.
Welch, B. L. and Peers, H.W. (1963) On formulae for confidence points based on integrals of weighted likelihoods. R. Statist. Soc. B, 25, 318–329.