The record number of hits on this blog goes to “When Bayesian Inference shatters,” where Houman Owhadi presents a “Plain Jane” explanation of results now published in “On the Brittleness of Bayesian Inference”. A follow-up was 1 year ago. Here’s how their paper begins:
Professor of Applied and Computational Mathematics and Control and Dynamical Systems, Computing + Mathematical Sciences,
California Institute of Technology, USA+
Computing + Mathematical Sciences,
California Institute of Technology, USA
“On the Brittleness of Bayesian Inference”
ABSTRACT: With the advent of high-performance computing, Bayesian methods are becoming increasingly popular tools for the quantification of uncertainty throughout science and industry. Since these methods can impact the making of sometimes critical decisions in increasingly complicated contexts, the sensitivity of their posterior conclusions with respect to the underlying models and prior beliefs is a pressing question to which there currently exist positive and negative answers. We report new results suggesting that, although Bayesian methods are robust when the number of possible outcomes is finite or when only a finite number of marginals of the data-generating distribution are unknown, they could be generically brittle when applied to continuous systems (and their discretizations) with finite information on the data-generating distribution. If closeness is defined in terms of the total variation (TV) metric or the matching of a finite system of generalized moments, then (1) two practitioners who use arbitrarily close models and observe the same (possibly arbitrarily large amount of) data may reach opposite conclusions; and (2) any given prior and model can be slightly perturbed to achieve any desired posterior conclusion. The mechanism causing brittleness/robustness suggests that learning and robustness are antagonistic requirements, which raises the possibility of a missing stability condition when using Bayesian inference in a continuous world under finite information.
© 2015, Society for Industrial and Applied Mathematics
The application of Bayes’ theorem in the form of Bayesian inference has fueled an ongoing debate with practical consequences in science, industry, medicine, and law . One commonly-cited justification for the application of Bayesian reasoning is Cox’s theorem , which has been interpreted as stating that any “natural” extension of Aristotelian logic to uncertain contexts must be Bayesian . It has now been shown that Cox’s theorem as originally formulated is incomplete  and there is some debate about the “naturality” of the additional assumptions required for its validity [1, 20, 29, 31], e.g., the assumption that knowledge can be always represented in the form of a σ-additive probability measure that assigns to each measurable event a single real-valued probability.
However—and this is the topic of this article—regardless of the internal logic, elegance, and appealing simplicity of Bayesian reasoning, a critical question is that of the robustness of its posterior conclusions with respect to perturbations of the underlying models and priors.
For example, a frequentist statistician might ask, if the data happen to be a sequence of i.i.d. draws from a fixed data-generating distribution μ†, whether or not the Bayesian posterior will asymptotically assign full mass to a parameter value that corresponds to μ†. When it holds, this property is known as frequentist consistency of the Bayes procedure, or the Bernstein–von Mises property.
Alternatively, without resorting to a frequentist data-generating distribution μ†, a Bayesian statistician who is also a numerical analyst might ask questions about stability and conditioning: does the posterior distribution (or the posterior value of a particular quantity of interest) change only slightly when elements of the problem setup (namely, the prior distribution, the likelihood model, and the observed data) are perturbed, e.g., as a result of observational error, numerical discretization, or algorithmic implementation? When it holds, this property is known as robustness of the Bayes procedure.
This paper summarizes recent results [46, 47] that give conditions under which Bayesian inference appears to be nonrobust in the most extreme fashion, in the sense that arbitrarily small changes of the prior and model class lead to arbitrarily large changes of the posterior value of a quantity of interest. We call this extreme nonrobustness “brittleness,” and it can be visualized as the smooth dependence of the value of the quantity of interest on the prior breaking into a fine patchwork, in which nearby priors are associated to diametrically opposed posterior values. Naturally, the notion of “nearby” plays an important role, and this point will be revisited later. Much as classical numerical analysis shows that there are “stable” and “unstable” ways to discretize a partial differential equation (PDE), these results and the wider literature of positive [8, 13, 19, 37, 38, 53, 56] and negative [3, 17, 23, 24, 35, 40] results on Bayesian inference contribute to an emerging understanding of “stable” and “unstable” ways to apply Bayes’ rule in practice.
The results reported in this article show that the process of Bayesian conditioning on data at finite enough resolution is unstable (or “sensitive” as defined in ) with respect to the underlying distributions (under the total variation (TV) and Prokhorov metrics) and is the source of negative results similar to those caused by tail properties in statistics [2, 18]. The mechanisms causing the stability/instability of posterior predictions suggest that learning and robustness are conflicting requirements and raise the possibility of a missing stability condition when using Bayesian inference for continuous systems with finite information (akin to the Courant–Friedrichs–Lewy (CFL) stability condition when using discrete schemes to approximate continuous PDEs). …
To keep reading the paper: http://epubs.siam.org/doi/10.1137/130938633
H. Owhadi, C. Scovel & T. J. Sullivan. “On the Brittleness of Bayesian Inference” SIAM Review 57(4):566–582, 2015. doi:10.1137/130938633