Gelman est effectivement une erreur statistician

eiffel-tower-design-bill-cannonA reader calls my attention to Andrew Gelman’s blog announcing a talk that he’s giving today in French: “Philosophie et practique de la statistique bayésienne. He blogs:

I’ll try to update the slides a bit since a few years ago, to add some thoughts I’ve had recently about problems with noninformative priors, even in simple settings.

The location of the talk will not be convenient for most of you, but anyone who comes to the trouble of showing up will have the opportunity to laugh at my accent.

P.S. For those of you who are interested in the topic but can’t make it to the talk, I recommend these two papers on my non-inductive Bayesian philosophy:

[2013] Philosophy and the practice of Bayesian statistics (with discussion). British Journal of Mathematical and Statistical Psychology, 8–18. (Andrew Gelman and Cosma Shalizi)
[2013] Rejoinder to discussion. (Andrew Gelman and Cosma Shalizi)

[2011] Induction and deduction in Bayesian data analysis. Rationality, Markets and Morals}, special topic issue “Statistical Science and Philosophy of Science: Where Do (Should) They Meet In 2011 and Beyond?” (Andrew Gelman)

These papers, especially Gelman (2011), are discussed on this blog (in “U-Phils”). Comments by Senn, Wasserman, and Hennig may be found here, and here,with a response here (please use search for more).

As I say in my comments on Gelman and Shalizi, I think Gelman’s position is (or intends to be) inductive– in the sense of being ampliative (going beyond the data)– but simply not probabilist, i.e., not a matter of updating priors. (A blog post is here)[i]. Here’s a snippet from my comments:

Although the subjective Bayesian philosophy, “strongly influenced by Savage (1954), is widespread and influential in the philosophy of science (especially in the form of Bayesian confirmation theory),” and while many practitioners perceive the “rising use of Bayesian methods in applied statistical work,” (2) as supporting this Bayesian philosophy, the authors [Gelman and Shalizi] flatly declare that “most of the standard philosophy of Bayes is wrong” (2 n2). Despite their qualification that “a statistical method can be useful even if its philosophical justification is in error”, their stance will rightly challenge many a Bayesian.

This will be especially so when one has reached their third thesis, which seeks a new foundation that uses non-Bayesian ideas. Although the authors at first profess that their “perspective is not new”, but rather follows many other statisticians who emphasize “the value of Bayesian inference as an approach for obtaining statistical methods with good frequency properties,” (3), they go on to announce they are “going beyond the evaluation of Bayesian methods based on their frequency properties as recommended by Rubin (1984), Wasserman (2006), among others, to emphasize the learning that comes from the discovery of systematic differences between model and data” (15).  Moreover, they suggest that “implicit in the best Bayesian practice is a stance that has much in common with the error-statistical approach of Mayo (1996), despite the latter’s frequentist orientation.[i] Indeed, crucial parts of Bayesian data analysis, such as model checking, can be understood as ‘error probes’ in Mayo’s sense”(2), which might be seen as using modern statistics to implement the Popperian criteria for severe tests.


The authors claim their statistical analysis is used “not for computing the posterior probability that any particular model was true—we never actually did that” (8), but rather “to fit rich enough models” and upon discerning that aspects of the model “did not fit our data” (8), to build a more complex, better fitting, model; which in turn called for alteration when faced with new data.

This cycle, they rightly note, involves a “non-Bayesian checking of Bayesian models” (11), but they should not describe it as purely deductive: it is not.  Nor should they wish to hold to that old distorted view of a Popperian test as “the rule of deduction which says that if p implies q, and q is false, then p must be false” (with p, q, the hypothesis, and data respectively) (22). Having thrown off one oversimplified picture, they should avoid slipping into another. 


My full comments are here.

[i] Some might view such a “non-inductive Bayesian philosophy” as an “inductive non-Bayesian philosophy”. Gelman is likely to scream at this, peut etre en francais. I’ve forgotten what little I knew of French.

Some related papers:

Gelman, A and C. Shalizi. (Article first published online: 24 FEB 2012). Philosophy and the Practice of Bayesian statistics (with discussion)”.  British Journal of Mathematical and Statistical Psychology (BJMSP).

Mayo, D., and D. Cox. 2006. “Frequentist Statistics as a Theory of Inductive Inference”. In Optimality: The Second Erich L. Lehmann Symposium, edited by J. Rojo, 77–97. Vol. 49, Lecture Notes-Monograph Series, Institute of Mathematical Statistics (IMS). Reprinted in D.Mayo and A. Spanos, 2010: 247–275.

Mayo, D., and A. Spanos. 2011. Error Statistics. In Philosophy of Statistics, edited by P. S. Bandyopadhyay and M. R. Forster. Handbook of the Philosophy of Science. Oxford: Elsevier.

Senn, S. Comment on Gelman and Shalizi (pages 65–67)

Wasserman, L. 2006. “Frequentist Bayes is Objective”. Bayesian Analysis 1(3):451-456. URL



Categories: Error Statistics, Gelman | Tags:

Post navigation

17 thoughts on “Gelman est effectivement une erreur statistician

  1. David Rohde

    What you are saying seems to imply you can be an error statistician without being a frequentist.

    I can see how you can compute a p-value conditional on an exchangeable model.

    At first glance it seems more problematic to use concepts such as coverage conditional on an exchangeable model (instead of the usual i.i.d)….

  2. David: I’m not sure I understand your points, even about being a “frequentist”.

    • David Rohde

      … If Gelman is effectively an error statistician but he uses exchangeable rather than i.i.d probability models then clearly i.i.d (i.e. frequentist probability) models are optional for an error statistician…

      • David: I don’t see why you identify iid with frequentist nor with error statistics–the key, to put it roughly, is appraising a conjecture H by how well probed H is rather than how probable it is (in any sense).

  3. Aside on something Gelman posted today.
    I haven’t read the paper discussed:
    but here are my two cents on the general research as to whether
    a high % of assertions made or published are false—usually looking at p-values versus… ? While I do not question the value of some/much of this debunking, I think it is worthwhile asking in a general way whether that is really the issue? I mean it is not as if our cumulative knowledge is all about how many discrete pieces are (allegedly found) true or false. Nor do they really know if these claims are false, but at most they consider things like whether they were replicated in some sense. Did these hypotheses solve their problems while being false? Who knows? So, in general, I suspect some of these publicized results are misleading as to what’s relevant. Typically claims are known to be false and are used as stepping stones toward better understanding—admittedly, I can’t say if this particular paper reflects that. I applaud the concern with error rates—I guess—but the supposition that this is the way to evaluate scientific progress or the error-correcting properties of inductive-statistical inference strikes me as quite strange. Perrin found like 20 substances apparently promising for Brownian motion experiments, but all but one proved of great value in the end—gamboge. Is his error rate 19 of 20? Prusiner had apparent successes on prion experiments over 10+ years– all proved disappointing until after 20 (or however many) years he hit success. Does anyone think his crude error rate is relevant? The key thing is unearthing the error and learning from it. Where is that credit recorded in the error rate reports?

    • David Rohde

      I am just suggesting that the frequentist concept of probability (limiting relative frequency – surely this implies something is repeated in an i.i.d fashion..) and your error statistical notions such as well probed can, but don’t need to be used together … something I thought you were sympathetic with…

      FWIW I think it is a misunderstanding of Bayes to think the central idea is computing how probable H is.

      • On the last point, surely that is the central idea of the conception of inductive probabilism. It’s not a minor thing, in fact I’d say it’s the strongest driving factor for that vast majority of Bayesians: anything to get that posterior, regardless of what it means. It’s in sync with a certain logical empiricist image of evidence.

        On the first point, I’m not sure what you’re saying I’m sympathetic with-you mean a notion that doesn’t require iid? I don’t think it does require iid. So maybe I’m agreeing.

  4. David Rohde

    I think the posterior distribution (of the parameter) is just a mechanism for producing the predictive distribution in the exchangeable setting. At least _some_ Bayesians see it that way… I am sympathetic to your criticisms in cases where they don’t…

    In terms of inferential tools I think there are two broad classes conditioning and falsificationist (which I take to involve error probabilities/posterior predictive checks). In terms of probabilistic specification its worth focusing on two cases: i.i.d. and exchangeable (of course there are more). So what combinations are possible?

    i) conditioning with exchangeable model – yes, usual Bayes set up

    ii) conditioning with i.i.d model – not really as conditionals and marginals are the same

    iii) falseficationist with i.i.d. – yes, usual set up when p-values are computed

    iv) falseficationist with exchangeable – yes, as Gelman shows us.

    Its interesting that i) and iii) dominate when iv) makes at least some sense…

    I think the reason in part is that the interpretation of probability is not so clear under iv). Is it decision theoretic or long run frequency? I can see that it makes sense to say either the exchangeable model is wrong or something unusual happened …. but… can you say under repeated sampling the p-value will shrink to zero? or the interval will have advertised coverage? (I am not sure, the question mark is not rhetorical).

    I guess the other issue with iv) is that there seems to be a messiness to the methodology which involves both falseficationist and conditioning. Gelman tells a nice story about this where model checking is related to Kuhn’s philosophy of paradigm shifts where conditioning is a more incremental step. This seems fine as a “best practice” of statistics but it jars with my idealistic streak that there should be an overall clean philosophy. Probably both Bayesians and freqentists alike will see it as trying to fix something that isn’t broken.

    • David Rohde: I like your classification scheme with scenarios (i)-(iv). It really helps clarify my thinking on what Gelman’s approach.

    • David: I’ve already written a lot on this in my published commentary and on this blog; unlike corey, I don’t see your remarks crystalizing things for me (then again I’ve been flying all day, but still). Using frequentist statistical models does not require any literal long-run repetitions…. Model checking is nothing like Kuhnian paradigm shifts which are politico-socio “negotiations”, but never mind these issues just now. I don’t see how anything remotely like a “falsification” is doable in a statistical context (or any other non-trivial scientific context) without falsification rules. For example, in supplementing ordinary Bayesianism to get falsification, one possibility is for add-on rules that allow going from a low enough posterior in H (or M) to reject a hypothesis H (or M). Without some add-on rule (which then must be justified!), no hypothesis is ever gotten rid of Bayesianly (only “disconfirmed” to some degree).

      But what if the falsification rule goes further and allows, as output, not merely “something is wrong somewhere with M” but some rival model M* that “fits” better than M (whether you’ve adjusted the prior, the parameters, or whatever will do the trick)? The central problem is that this M and M* are scarcely exhaustive of the possibilities, and M* may thus have “passed” a test with low stringency. Someone else may tidy things up with M**, or M***, with no error control being evident at all. M* might be “better or even best tested’ for the first person, while M** (which perhaps alters priors, and keeps everything else the same) may pass swimmingly for the second person. We don’t obviously get assurance that the particular conjectured model that this “method” regards as “better tested” is adequately capturing the source of the data….[i]
      Anyway, let me recommend Hennig as offering an especially useful take on Gelman (on this blog):

      [i]Error statisticians also need rules, in this case they reflect the rationale that inferred (or even merely preferred) models/hypotheses pass adequate error probes, etc.

      Sorry I had to correct this comment—I won’t write any more just now.

      • “unlike corey, I don’t see your remarks crystalizing things for me”

        You have to have a Bayesian perspective on exchangeability to get it. Also, I meant to write “…clarify my thinking on what Gelman’s approach aims to do.”

        • something ineffable?
          But I was really talking about the remarks as a whole.

          • Naw, just be thinking about exchangeability as characterizing states of information about elements of an assumed model, as opposed to characterizing a set of distributions of relative frequency.

  5. David Rohde

    Thanks Corey, (nice to see you have started a blog).

    You are right the post by Christian is very good. I agree with most of it. My main doubt is that the meaning (and correctness) of “all models are wrong” depends on articulating probability which is Henning’s main point…

    Mayo.. we seem to talk past each other quite a bit, but thanks again for being generous with your time to respond (even when jet lagged).

    I think the bulk of your response is a critic of using falseficationism in a Bayesian approach on the basis that Bayes is used to get a posterior probability of a parameter/hypothesis. I think maybe we agree but for very different reasons….

    Probably the most profound difference (and this comes up a lot in correspondence between us) is that I prefer to think about Bayesian models purely in terms of exchangeable sequences of observables with the parameters integrated out. If this is done directly for a model this is the marginal likelihood, but if some notion of “or more extreme value” is used then it becomes a prior predictive Bayesian p-value. I think an accept reject rule can be integrated into this context just fine.

    (FWIW Gelman uses instead posterior predictive p-values, which are messier philosophically, but less prior sensitive and more interestingly I think, graphical checks, but I don’t feel so strongly about rules for applied statistics).

  6. Christian Hennig

    I just wanted to say thank you for bringing up my post again; as opposed to Mayo, I’m not present on this blog when I’m travelling but I’ll try to catch up.

  7. Sorry for catching up so late: the title is in-intelligible in French… What about Gelman est effectivement un statisticien de l’erreur (and even then it does not ring a bell in the French side of my brain…)

    • Christian:
      I was expecting a correction from someone who knew French. I sent it to Gelman, and he thanked me but didn’t correct the French. So what might ring a bell in the French side of your brain? I’m serious. Error probability must be used, and in fact, if I weren’t lazy, I’d try to check Neyman’s articles in French (of which there are many). So I count on you for the best way to express something akin to: one who uses error probabilities (in the service of stringent error probing?)

Blog at