Irony and Bad Faith: Deconstructing Bayesians-reblog

 The recent post by Normal Deviate, and my comments on it, remind me of why/how I got back into the Bayesian-frequentist debates in 2006, as described in my first “deconstruction” (and “U-Phil”) on this blog (Dec 11, 2012):

Some time in 2006 (shortly after my ERROR06 conference), the trickle of irony and sometime flood of family feuds issuing from Bayesian forums drew me back into the Bayesian-frequentist debates.1 2  Suddenly sparks were flying, mostly kept shrouded within Bayesian walls, but nothing can long be kept secret even there. Spontaneous combustion is looming. The true-blue subjectivists were accusing the increasingly popular “objective” and “reference” Bayesians of practicing in bad faith; the new O-Bayesians (and frequentist-Bayesian unificationists) were taking pains to show they were not subjective; and some were calling the new Bayesian kids on the block “pseudo Bayesian.” Then there were the Bayesians somewhere in the middle (or perhaps out in left field) who, though they still use the Bayesian umbrella, were flatly denying the very idea that Bayesian updating fits anything they actually do in statistics.3 Obeisance to Bayesian reasoning remained, but on some kind of a priori philosophical grounds. Doesn’t the methodology used in practice really need a philosophy of its own? I say it does, and I want to provide this.

The result of my own interest here gave rise to a Kent-Virginia Tech workshop (in Kent)4 followed by the 2010 conference at the LSE, from which grew the special volume (see Mayo 2010, RMM volume, for examples and references)….

Especially surprising to me, leaders of default Bayesianism, arguably the most predominant current form, began claiming that “violation of principles such as the likelihood principle is the price that has to be paid for objectivity” (Berger 2006, 394). As such, the default Bayesian may welcome with relief my critique of Birnbaum’s famous LP argument. (See my December 6 and 7 posts, and our current U-Phil)  Even though “objectivity” is used very differently, there is still this odd sort of agreement in phrases uttered. While for us the violation is fully in order and is picked up on through the sampling distribution; for Bayesians it is anything but expected, and is picked up through model-dependent changes of priors (introducing strict incoherence).

It is noteworthy that default Bayesians don’t agree with each other even with respect to standard applications, as they readily admit.  For instance, Bernardo, but not Berger, rejects the spiked prior that leads to pronounced conflicts between frequentist p-values and posteriors.  While reonuncing the spikes makes the numbers agree (with frequentists), there is no evidence that the result is either an objective or rational degree of belief (as he intends) or an objective assessment of well-testedness (as our error statistician achieves). Bernardo wants to match the frequentist in the optional stopping case, but I take it Jim still adheres to the position of Berger and Wolpert 1988 on the SRP.

OK, so here’s an especially intriguing remark by Jim Berger that I think bears upon the current mindset. (Jim is aware of my efforts, it will come as no surprise that I’m sharing my meandering here.)

Too often I see people pretending to be subjectivists, and then using “weakly informative” priors that the objective Bayesian community knows are terrible and will give ridiculous answers; subjectivism is then being used as a shield to hide ignorance. . . . In my own more provocative moments, I claim that the only true subjectivists are the objective Bayesians, because they refuse to use subjectivism as a shield against criticism of sloppy pseudo-Bayesian practice. (Berger 2006, 463)

How might we deconstruct this fantastic remark of Berger’s?5  (Granted, this arises in his rejoinder to others, but this only heightens my interest in analyzing it.)

Here, “objective Bayesians” are understood as using (some scheme) of default or conventionally derived priors.  One aspect of his remark is fairly clear: pseudo-Bayesian practice allows “terrible” priors to be used, and it would be better for them to appeal to conventional “default” priors that at least will not be so terrible (but in what respect?). It is the claim he makes in his “more provocative moments” that really invites deconstruction. Why would using the recommended conventional priors make them more like “true subjectivists”?  I can think of several reasons—but none is really satisfactory, and all are (interestingly) perplexing. I am reminded of Sartre’s remarks in Being and Nothingness on bad faith and irony:

In irony a man annihilates what he posits within one and the same act; he leads us to believe in order not to be believed; he affirms to deny and denies to affirm; he creates a positive object but it has no being other than its nothingness.

So true!  (Of course I am being ironic!) Back to teasing out what’s behind Berger’s remarks.
Now, it would seem that if she did use priors that correctly reflected her beliefs (call these priors “really informed by subjective opinions”(riso?), and that satisfied the Bayesian formal coherency requirements, then that would be defensible for a subjective Bayesian. But Berger notices that, in actuality, many Bayesians (the pseudo-Bayesians) do not use riso priors. Rather, they use various priors (the origin of which they’re unsure of) as if these really reflected their subjective judgments. In doing so, she (thinks that she) doesn’t have to justify them—she claims that they reflect subjective judgments (and who can argue with them?).

According to Berger here, the Bayesian community (except for the pseudo-Bayesians?) knows that they’re terrible, according to a shared criterion (is it non-Bayesian? Frequentist?). But I wonder: if, as far as the agent knows, these priors really do reflect the person’s beliefs, then would they still be terrible? It seems not. Or, if they still would be terrible, doesn’t that suggest a distinct criterion other than using “really informed” (as far as the agent knows) opinions or beliefs?…

RMM 2011 refers to the special issue of the on-line journal, Rationality, Markets and Morals housing papers growing out of the LSE conference of June 2010: Statistical Science and Philosophy of Science: Where Do (Should) They Meet in 2011 and Beyond?

To read follow-up blogposts to this call for U-Phils:

Contributed deconstructions of J. Berger:https://errorstatistics.com/2011/12/26/contributed-deconstructions-irony-bad-faith-3/

J. Berger on J. Berger:https://errorstatistics.com/2011/12/29/jim-berger-on-jim-berger/

[1] It was David Cox who first alerted me.  Then there was Dongchu Sun, a statistician who was at Virginia Tech for a few years.

[2] Yes, I’d given up on them, and was happy to spend all my remaining exiled days on philosophy of experiment.

[3] I’m not here including things like “Bayes nets,” which use conditional probability (as do we all) but are not really Bayesian.

[4] J. Williamson, J. Corfield and others at Kent co-hosted the first.

[5] As noted, Jim Berger is aware that I’m discussing this on my blog.  I hope he will comment!

Berger, J. (2006),“The Case for Objective Bayesian Analysis”, and “Rejoinder”, Bayesian Analysis 1(3), 385–402; 457-464.

Mayo, D. (2011), “Statistical Science and Philosophy of Science: Where Do/Should They Meet in 2011 (and Beyond)?”  RMM Vol. 2, 2011, 79–102

Sartre, J.P Being and Nothingness: an essay in phenomenological ontology (1943, Gallimard); English 1956, Philosophical Library Inc.

Senn, S. (2011) You may believe you are a Bayesian but you are probably wrong. Rationality, Markets and Morals RMM Vol. 2, 2011, 48–66

Categories: Likelihood Principle, objective Bayesians, Statistics | Tags: , , , ,

Post navigation

33 thoughts on “Irony and Bad Faith: Deconstructing Bayesians-reblog

  1. guest

    “Here, ‘objective Bayesians’ are understood as using (some scheme) of default or conventionally derived priors.”

    Jaynes was a diehard objective Bayesian and most certainly did not advocate the use of default priors (except in the sense that a given state of prior knowledge should lead to the same prior distribution).

    See for example his paper “Highly Informative Priors”. The techniques described in that paper revolutionized image reconstruction.

    At some point Frequentists are going to have to confront Jaynes arguments. This should be right up your ally since his book “Probability: the logic of science” is not a technical book, but rather a philosophy book supported by technical arguments. His habit of reducing differences of philosophy to technical matters which can be examined objectively should be welcomed by all who are interested in the philosophy of statistics.

    • Guest: On your first point, I am following J. Berger and others in calling their account “objective Bayesian”. On your other point, aside from questioning that we have and would want to use prior beliefs for some (non-conventional?) “objective” priors in science, I don’t understand what they might mean (except in certain very special cases). I have read Jaynes long ago.(e.g., EGEK 1996, p. 358).

      • guest

        “prior knowledge” not “prior beliefs”. Prior distributions are based on things which are true (or at least we are willing to assume their true for the problem). “Beliefs” are irrelevant.

        The misunderstanding on this point illustrates my point better than anything. Frequentists have igonored the Laplace-Jeffries-Jaynes line of the theory but are endlessly arguing against Savage.

      • guest

        “I don’t understand what they might mean”

        A beginning of an answer (useful enough in blog comments but not in general), would be to say:

        A probability distribution “means” whatever predictions about the real world that can be extracted from it. If the distribution makes claims about the real world that are correct then it is good. Note “probabilities” themselves are not directly statements about the real world! So it’s not the distribution itself that is correct, but only real-world statements implied by the distribution that are relevant.

        This works for any kind of probability distribution: prior, sample, posterior. Everyone is used to judging sampling distributions this way. Bayesians of Gelman’s stripe are used to judging posteriors this way. The prior can be judged in exactly the same way too.

        For example suppose “f” is the frequency of heads in 10,000 coin flips. If P(f) is sharply peaked about f=.5, then we can predict “the actual frequency will be near .5”. If this is actually true then the prior P(f) is in some sense good.

        Another example would be: if the prior for my weight “m” is N(150,20) then I would predict that “my weight is between 110 and 190”. Again, if this is actually true then the prior P(m) is in some sense good.

        There are some odvious objections to this, which is why it can’t be taken as fundamental. I only offer it to those who have absolutly no conception of what an objective prior might be.

        • Nicole Jinn

          To: guest: How do you know that the distribution makes claims (relevant) about the real world that are correct? It does not seem obvious to me, and (I think) one needs to supply good reasons for why we should think that the distribution makes those true claims, which more often than not are lacking.

          • Anon

            “How do you know that the distribution makes claims (relevant) about the real world that are correct?”

            Ideally because you designed (or chose) the distribution so that it squared with what you know to be true, but was neutral with respect to everything else.

    • original_guest

      “Guest”; could you *please* use another name, as requested by the board managers?

      • guest

        “original”-guest:

        You will never usurp my throne.

        • original_guest

          Nor I am trying to. But you might prefer that people debating what you’ve said above (and presumably will on future posts) don’t get confused when they search previous posts and comments; my statements do not agree with yours.

          Please, use another name. It’s a minor request, no?

  2. Christian Hennig

    guest: I’m currently reading Jaynes but haven’t yet found a discussion of the issue that the prior knowledge that typically exists in real problems hardly ever translates into a prior distribution in a straightforward manner.

    • Christian: That is typical of people committed to reducing science to a deductive logic of probabilities. I realize it has been an influential philosophy…

      • Anon

        Science is not being reduced to deductive logic. Only one tiny part of the scientific process is being reduced to a kind of logic, namely the following:

        “from a given state of knowledge (assumed to be true) what is the best guess that can be made?”

        This is a necessary part of science. since before you can judge a theory you have to know what the theory implies, but it is far from being the entire story.

        • And this can be done by deductive logic? In any event one needs not only given knowledge (itself questionable especially for theories) but a rule. Does Bayes’s rule tell you how to deduce testable predictions? It does not.

          • Anon

            Probability is a generalization of deductive logic (hence the title “Probability: the logic of science”)

            To answer the rest it’s important to distinguish two different types of problems. One which I’ll call “Engineering” in which you just want to make correct predictions. The second which I’ll call “Physics” in which you want to discover the knowledge needed to make correct predictions.

            The “Engineer” wants to take a set of true and relevant facts and use them to accurately predict other facts. The “Physicists” wants to discover what the relevant facts the Engineer is going to need.

            The “Engineer” if they really do know all the relavant factors can use Bayes rules to select a hypothesis/model. Here you would need to know that you had a truly exhaustive set of hypothesis and that one of them really was the right answer.

            The “Physicist” is using bayes rule in a very different way. They start out with a given state of knowledge and then use bayes rule to make the best they can with it. If the predictions are good then that gives them confidence that they know all the relevant details and the Engineers can go about their work. If the prediction fails, then that is strong evidence that they needed to know something else to make accurate predictions and they need to go find out what it is. As things currently stand, Bayes Rule, isn’t going to them tell them what that “something else” is, but then that’s where things get interesting.

            There was a notion going back to Fisher (see Fisher’s Information) that it was possible to consider the “amount of information” you can learn from a measurement or experiment. This has been exploited quite a bit in design of experiment type situations. This tradition seems to have been taken up by Bayesians more than Frequentist. It’s natural for an objective Bayesian to think in terms of “information” or “knowledge”. Furthermore Fisher’s information is related to Entropy (associated with Bayesians like Jaynes) which is more general and useful.

            A very interesting paper related directly to this subject is titled “What is the Question?”:

            Click to access what.question.pdf

            There is almost certainly significant undiscovered results along these lines. In particular it’s worth following up the Cox’s reference concerning “a logic of questions”.

    • Anon

      In Jaynes’s book and papers (which are redily available online) there is example after example of translating prior knowledge into priors. The Maximum entropy princple was one, and group theoretic invaraince principles were another.

      See for example the paper “Well Posed Problems” in which he used transformation groups to derive the probability distribution needed to solve Buffon’s Needle problem (you can perform the experiment yourself pretty easily to see if he gets the right answer). Note the discussion at the end about why this “transformation group” methods, which seem to have no frequency connection at all, do turn out to have a frequency connection.

      Having said that, Jaynes himself believed that finding more and more practical principles for encoding knowledge in probability distributions was the major unsolved problem of prob & stat.

      In practice, if you’re really stuck, you can use uniform priors like Laplace did which will typically yield the same numerical results you get form confidence intervals.

      the paper can be found here:

      Click to access well.pdf

  3. Anon

    I mean “Bertrand” not “Buffin”. Really, that solution should send shivers down the spine of any thoughtful Frequentists. Basically he encodes his “ignorance” into the probability distribution by saying “the distribution has to be invariant with respect to anything where ignorant about” or alternatively “if two states of knowledge are the same then they should imply the same probability distribution”

    • Anon: Yes, it always comes back to some kind of principle of indifference, modeling ignorance or lack of information (not knowledge as the first commentator had said). It doesn’t send shivers down frequentist spines, but it does make us shake our heads that this this very old move (w/tinkering) is regularly proferred as a new and improved solution to scientific inference.

      • Anon

        Respectfully: if you don’t see how devistating that paper is to the Frequentist world view, you haven’t read it.

        • Christian Hennig

          The paper is nice but there is nothing devastating about it for frequentists in general apart from showing that something that von Mises said about a specific problem may not have taken a very elegant argument into account.
          “Devastating to the Frequentist world view”… how dramatic…

      • Anon

        A state of knowledge can be specified by conveying what is known or by what is not known. Just like you can describe a cup of water by how full it is or by how empty.

        • Anon. No. The exhaustion you describe is precisely what is not available in science which is essentially open-ended. It is a major reason for rejecting accounts that depend upon Bayesian catchalls.

  4. Christian Hennig

    Anon: You’re right that Jaynes gives examples and I’m fine with many of them. However, it is an essential characteristic of such examples that the available knowledge can be stated on very limited space and is of easily formalisable type, which is different from most real applications. Every “example” of a serious application you see in lectures or books is necessarily cut short to some extent, that’s not only Jaynes’s problem.
    However, it is more of a problem to Jaynes’s approach than to some others because
    a) Jaynes doesn’t want subjective assessments of evidence and
    b) one of his basic principles states ((1.39b) in his book) “the robot always takes into account all the evidence it has relevant to the question. It does not arbitrarily ignore some of the information.”

    Jaynes is well aware that this is not always possible in reality but he certainly wouldn’t advertise something like “if the available information is too complex, just use Laplace”…

    • Christian: Thanks for keeping up the sagacious replies, as I haven’t had time. One thing has to do with what even counts as relevant information. Jaynes says the sampling distribution is irrelevant once x is known, whereas it’s of crucial relevance for sampling theorists. So even alluding to relevant info involves essential foundational issues about the nature of statistical inference and the goals of probability in inference.

  5. Anon

    Hennig:

    You can always widdle a problem down by throwing away prior information (or assumptions) until the problem can be handled by some cookie cutter method. Whether you then use cookie cutter Classical Frequentist statistics or “default” Baysisan methods doesn’t seem to make much difference in practice.

    If you need to do better than that, then you’re going to have to include new information which is both true and relevant in the analysis somehow. This is true for Bayesians and Frequenitsts, and there’s no way around it. No one has ever claimed this was easy in general, or had been figured out in every case, or was easier for Frequentists to do than for Bayesian. Some claim it is easier for Bayesians though.

  6. I noticed that Normal Deviate had “‘create’ procedures with frequency guarantees” in his chart. I like the word “create” here, and hopefully it will help correct an impression some may have that frequentists appraise but never actually provide/create methods.

  7. Looking at Gelman’s deconstruction of Berger from before*:

    “I think subjectivity and objectivity both are necessary parts of research. Science is objective in that it aims for reproducible findings that exist independent of the observer, and it’s subjective in that the process of science involves many individual choices. And I think the statistics I do (mostly, but not always, using Bayesian methods) is both objective and subjective in that way.”

    Thus Gelman argues that Berger’s criticism doesn’t apply to the likes of him. Gelman rejects subjective/objective as a simple dichotomy, moreover he aims for objectivity in a sense but says that he must use subjective personal experience to do this.

    “… more and more, I’m coming across applied problems where I wouldn’t want to be noninformative even if I could, problems where some weak prior information regularizes my inferences and keeps them sane and under control.”

    So Gelman uses subjective information to make what he does objective, and it seems to me he argues the opposite of Berger: that the true objectivists are the subjectivists!

    Another form of irony?

    * This one: https://errorstatistics.com/2011/12/26/contributed-deconstructions-irony-bad-faith-3/

    • I should say too, I should perhaps be less glib about eliding “subjective (objective) Bayesianism” with “subjectivism (objectivism)”.

      If we take the idea of deconstructing the text more seriously, we should be paying attention to the historical context, “reading between the lines” a bit, failing to do so might be another kind of “bad faith”.

      From that perspective, I think Berger and Gelman are both pointing out that the terms “subjective Bayesian” and “objective Bayesian” are both little more than historical accidents — in fact, all science is both subjective and objective, and hence the apparent paradox of Berger’s “provocative” claim and my interpretation of Gelman is not in fact a paradox at all, simply a warning against overly literal interpretation of terminology.

      • James: Berger advances a “sociological” use of “objective” Bayesian–as he says in his article, it’s an attractive term that makes the approach more acceptable to scientists. I think the distinction between objective and subjective accounts of inference is a serious one, and remarks about there being both subjective and objective components to everything we do overlook what is central about scientific objectivity. Such remarks have had a tendency to simply blur the issue and downplay the possibility of responsible, objective scrutiny in science. Berger’s remark is “paradoxical” in the sense I describe: it’s essentially saying people do or must do Beyesian inference in bad faith. Objective Bayesianism tends to be schizophrenic: use conventional priors, but if you have information use it (if you want to).

        • Ah, ok, I think I understand better in the light of both your comments. Gelman and Berger are I think quite at odds with each other — for starter’s Gelman does seem to prefer “non-informative” as a term over “objective”, and this is precisely for the reasons you give. I don’t think Gelman regards “objective” as at all meaning that the procedure is “automatic”! On the other hand, for Berger it’s about what people (meaning businesses and governments, it seems) “want”:

          “In a different world, clinical trials for a new drug would incorporate the subjective opinions of all scientists involved but, since most of the scientists would be from the interested pharmaceutical company, regulatory agencies instead prefer to base their analyses on objective methodology.” (Berger)

          Objective here meaning, I think “automatic”. A related term is “systematic”, leading to such table and flow-chart infused joys as this: http://www.plosmedicine.org/article/info%3Adoi%2F10.1371%2Fjournal.pmed.1000100 — I imagine this is more or less what you mean by “automaticity”.

    • James:

      The question of objectivity arises quite a lot on this blog. The top two posts below are likely to be most relevant. People often overlook that the key to objectivity in science is not automaticity or freedom from human interest, but linking up with the real world and subjecting hypotheses and claims to the test of experiment.
      https://errorstatistics.com/2012/03/14/2752/
      https://errorstatistics.com/2012/03/18/2837/

      https://errorstatistics.com/2012/09/15/more-on-using-background-info/
      https://errorstatistics.com/2011/10/13/objectivity-2-the-dirty-hands-argument-for-ethics-in-evidence/
      https://errorstatistics.com/2011/10/16/objectivity-3-cleaner-hands-with-metastatistics/

Blog at WordPress.com.