“Frequentist Accuracy of Bayesian Estimates” (Efron Webinar announcement)

imgres

Brad Efron

The Royal Statistical Society sent me a letter announcing their latest Journal webinar next Wednesday 21 October:

…RSS Journal webinar on 21st October featuring Bradley Efron, Andrew Gelman and Peter Diggle. They will be in discussion about Bradley Efron’s recently published paper titled ‘Frequentist accuracy of Bayesian estimates’. The paper was published in June in the Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol 77 (3), 617-646.  It is free to access from October 7th to November 4th.

Webinar start time: 8 am in California (PDT); 11 am in New York (EDT); 4pm (UK time).

During the webinar, Bradley Efron will present his paper for about 30 minutes followed by a Q&A session with the audience. Andrew Gelman is joining us as discussant and the event will be chaired by our President, Peter Diggle. Participation in the Q&A session by anyone who dials in is warmly welcomed and actively encouraged.Participants can ask the author a question over the phone or simply issue a message using the web based teleconference system.  Questions can be emailed in advance and further information can be requested from journalwebinar@rss.org.uk.

More details about this journal webinar and how to join can be found in StatsLife and on the RSS website.  RSS Journal webinars are sponsored by Quintiles.

We’d be delighted if you were able to join us on the 21st and very grateful if you could let your colleagues and students know about the event.

I will definitely be tuning in!

Categories: Announcement, Statistics

Post navigation

6 thoughts on ““Frequentist Accuracy of Bayesian Estimates” (Efron Webinar announcement)

  1. Steven McKinney

    Another amazing development of statistical methodology by Bradley Efron.

    I find it fascinating that there is very little discussion of this methodology – I think people are still picking up their jaws from the floor.

    There’s a bit of confused hand-waving by some religious Bayesians on this old discussion thread

    https://normaldeviate.wordpress.com/2013/03/19/shaking-the-bayesian-machine/

    and some in-depth Bayesian reflection offered at

    http://andrewgelman.com/2015/10/19/my-webinar-with-brad-efron/

    “For symmetry, we should do a ‘Bayesian properties of Frequentist answers’.

    How well does a Confidence Interval represent the range of possibilities compatible with the evidence? Well since some 95% CI’s contain values all of which are provably impossible from the same assumptions used to derive the CI, they suck.”

    I’m going to read and re-read Efron’s paper and computer code and work towards incorporating his beautiful insight into my analyses, where appropriate.

    • Steven: Great to hear from you (many of us have been stuck on the post https://errorstatistics.com/2015/10/18/statistical-reforms-without-philosophy-are-blind-ii-update/
      Aside from the first 15 minutes (electricity outage) I listened/watched the webinar, but didn’t really understand it, so I’d be delighted to hear your take, and why it’s so brilliant, etc. Not that I doubted it. I was hoping for some clarity by way of answers to Gelman’s question but didn’t get it. If it’s too much for a comment, I’d be glad to put an informal summary up in a post. But do explain a little bit in any event.(I remembered the “shaking the Bayesian machine” talk and assumed this related to it.) I will check your two links.

  2. Steven McKinney

    I can give a general overview of Efron’s brilliance via his Introduction discussion (his words ” in double quotes “).

    ” 1. Introduction
    The past two decades have witnessed a greatly increased use of Bayesian techniques in statistical applications. Objective Bayes methods, based on neutral or uniformative priors of the type pioneered by Jeffreys, dominate these applications, carried forward on a wave of popularity for Markov chain Monte Carlo (MCMC) algorithms. Good references include Ghosh (2011), Berger (2006) and Kass and Wasserman (1996).”

    A nice concise summary, one that should bring joy to anyone interested in Bayesian methods after all the Bayesian-bashing of the middle 20th century. Efron himself has crafted many beautiful results in the Empirical Bayes arena. He has reviewed important differences between Bayesian and frequentist outcomes that point to some as-yet unsettled issues in statistical theory and philosophy such as his scales of evidence work.

    ” Suppose then that, having observed data x from a known parametric family f.mu(x), I wish to estimate t(mu), a parameter of particular interest. In the absence of relevant prior experience, I assign an uninformative prior pi(mu), perhaps from the Jeffreys school. Applying Bayes rule yields theta.hat, the posterior expectation of t(mu) given x:

    theta.hat = E{t(mu)|x} (1.1) ”

    A simple setup, that is of interest in many MCMC methods. This issue is certainly relevant in all of the bioinformatics algorithms that use MCMC methods to comb through mountains of genomic data these days, such as those being developed by the wunderkinds here at the British Columbia Cancer Research Centre and elsewhere. Efron’s gift is timely indeed.

    ” How accurate is theta.hat? The obvious answer, and the one that is almost always employed, is to infer the accuracy of theta.hat according to the Bayes posterior distribution of t(mu) given x. This would obviously be correct if pi(mu) were based on genuine past experience. It is not so obvious for uninformative priors. I might very well like theta.hat as a point estimate, based on considerations of convenience, coherence, smoothness, admissibility or aesthetic Bayesian preference, but not trust what is after all a self-selected choice of prior as determining theta.hat’s accuracy. Berger (2006) made this point at the beginning of his section 4. ”

    Efron demonstrates his sensibilities here. He’s not dogmatic about any particular methodology and has a keen sense for and appreciation of aesthetics. That’s why I’m always excited when Efron produces new findings.

    ” As an alternative, this paper proposes computing the frequentist accuracy of theta.hat, i.e. regardless of its Bayesian provenance, we consider theta.hat simply as a function of the data x and compute its frequentist variability. ”

    Efron adeptly dances between and within the Bayesian and frequentist ballrooms.

    ” Our main result, which is presented in Section 2, is a general accuracy formula for the delta method standard deviation of theta.hat: general in the sense that it applies to all prior distributions, uninformative or not. Even in complicated situations the formula is computationally inexpensive: the same MCMC calculations that give the Bayes estimate theta.hat also provide its frequentist standard deviation. A lasso-type example is used for illustration. Many of the examples that follow use Jeffreys priors; this is only for simplified exposition and is not a limitation of the theory. ”

    This is the brilliance here. Efron has identified an extremely useful general accuracy formula, using results readily at hand, detached from priors and all the handwringing that goes on as people argue endlessly about whose prior is better than whose.

    The discovery that the very MCMC results that produced the point estimate can also yield an estimate of its variability gives us extremely valuable information at no extra charge. MCMC algorithms do indeed produce useful point estimates, after much computation. So what were previous ways to obtain an estimate of variability? One could bootstrap the process (again, an amazing statistical methodology from the mind of Efron), but this would involve thousands of repeats of the already computationally intensive process. One could assess the variability from the spread of the posterior distribution, but this depends on the prior, and the handwringing begins.

    ” In fact several of our examples will demonstrate near equality between Bayesian and frequentist standard deviations. That does not have to be so: remark 1 in Section 6 discusses a class of reasonable examples where the frequentist accuracy can be less than half of its Bayesian counterpart. Other examples will calculate frequentist standard deviations for situations where there is no obvious Bayesian counterpart, e.g. for the upper end point of a 95% credible interval. ”

    Efron demonstrates, once again, how sensible Bayesian and frequentist computations so often yield nearly equal answers. This is why so much of the silly bashing of frequentist results by Bayesian zealots, and Bayesian results by frequentist zealots, is pointless. Mostly, we arrive at the same conclusion regardless of which strategy we employ, and Efron is so adept at developing useful and practical solutions from methods in either camp without being a zealot. In those cases where frequentist and Bayesian solutions differ starkly, we still have something to learn.

    ” The general accuracy formula takes on a particularly simple form when f.mu(x) represents a p-parameter exponential family: Section 3. Exponential family structure also allows us to substitute parametric bootstrap sampling for MCMC calculations, at least for uninformative priors. This has computational advantages. More importantly, it helps to connect Bayesian inference with the seemingly superfrequentist bootstrap world, which is a central theme of this paper. ”

    More bridge building between the Bayesian and frequentist paradigms. Always useful.

    ” The general accuracy formula provides frequentist standard deviations for Bayes estimators, but nothing more. Better inferences, in the form of second-order-accurate confidence intervals, are developed in Section 4, again in an exponential family bootstrap context. Section 5 uses the accuracy formula to compare hierarchical and empirical Bayes methods. The paper concludes with remarks, details and extensions in Section 6. ”

    Amazing. This paper is such a rich vein of statistical mathematical development.

    ” The frequentist properties of Bayes estimates is a venerable topic, that has been nicely reviewed in chapter 4 of Carlin and Louis (2000). Particular attention focuses on large sample behaviour, where ‘the data swamp the prior’ and theta.hat converges to the maximum likelihood estimator (see result 8 in section 4.7 of Berger (1985)), in which case the Bayes and frequentist standard deviations are nearly the same. Our accuracy formula provides some information about what happens before the data swamp the prior, though the present paper offers no proof of its superiority to standard asymptotic methods. ”

    Efron is always objective about judging superiority. He’s clear about the criteria upon which he does base his assessments of superiority when he does so, and it’s never ego-based.

    ” Some other important Bayesian-cum-frequentist topics are posterior and preposterior model checking as in Little (2006) or chapter 6 of Gelman et al. (1995), Bayesian consistency (Diaconis and Freedman, 1986), confidence matching priors, going back to Welch and Peers (1963), and empirical Bayes analysis as in Morris (1983). Johnstone and Silverman (2004) have provided, among much else, asymptotic bounds for the frequentist accuracy of empirical Bayes estimates. ”

    More interesting reading to do. Efron’s citations do not include zealots and crackpots, unless he is deconstructing their sillinesses. I appreciate Efron’s vetting of the literature.

    ” Sensitivity analysis — modifying the prior as a check on the stability of posterior inference — is a staple of Bayesian model selection. The methods of this paper amount to modifying the data as a posterior stability check (see lemma 1 of Section 2). The implied suggestion here is to consider both techniques when the prior is in doubt. ”

    More useful suggestions about sensitivity analysis methods, always important to assess to ensure results are not potentially misleading.

    ” The data sets and function freqacc are available from
    http://statweb.stanford.edu/∼brad/papers/jrss. ”

    Efron is always generous about sharing his insights. His results are not squirreled away behind corporate doors, seeking to make vast profit from a software offering, or a consulting firm.

    So, in a nutshell (okay it’s a big coconut shell) that is my summary of the brilliance I see in this paper. A beautiful presentation of a general result that gives very useful information at no extra charge. It was sitting there all along, and Efron saw it, crafted it and presented it to us succinctly, to use forevermore. This theory (Lemma 1 and Theorem 1) will be in future statistics textbooks, alongside many other Efron discoveries. It’s amazing to live during the times of such a statistical genius.

  3. Sander Greenland

    Steve: Great summary!
    Have you seen other recent work connecting frequentist and Bayesian concepts, especially with confidence distributions? e.g., Bickel, A frequentist framework of inductive reasoning. Sankhya 74-A, Part 2:141-169. I’d be curious about how this literature appears from the perspective of Efron’s results.

  4. Steven McKinney

    Sander:

    I don’t see enough work bridging the Bayesian-Frequentist arenas. Thanks for the pointer to Bickel’s paper.

    Bickel’s paper is a heavier lift, on a more obscure topic than Efron’s latest. Since I’m not much of a gambling man, I’ll need to read Bickel’s paper a few times to map his ideas to the type of applied statistics I do.

    One thing in common is Bickel’s and Efron’s adept use of measure theory to structure the bridges between Bayesian and frequentist realms. Efron always shows the development of his ideas with actual analysis of relevant data, which is hugely helpful in demonstrating the utility of the complex mathematics underlying it all.

    • I did a postdoc with David Bickel; it’s neat to see his work being cited here.

      The key to reading David’s stuff is that he likes to be very rigorous, so even simple concepts and setups are imparted with imposing-looking notation that specifies every particular. However, only a little effort is required to assimilate this simple stuff (even if it looks like a bear) and get to the meat of any given paper.

Blog at WordPress.com.