Posts Tagged With: Andrew Gelman

U-PHIL: Gandenberger & Hennig: Blogging Birnbaum’s Proof

greg picDefending Birnbaum’s Proof

Greg Gandenberger
PhD student, History and Philosophy of Science
Master’s student, Statistics
University of Pittsburgh

In her 1996 Error and the Growth of Experimental Knowledge, Professor Mayo argued against the Likelihood Principle on the grounds that it does not allow one to control long-run error rates in the way that frequentist methods do.  This argument seems to me the kind of response a frequentist should give to Birnbaum’s proof.  It does not require arguing that Birnbaum’s proof is unsound: a frequentist can accommodate Birnbaum’s conclusion (two experimental outcomes are evidentially equivalent if they have the same likelihood function) by claiming that respecting evidential equivalence is less important than achieving certain goals for which frequentist methods are well suited.

More recently, Mayo has shown that Birnbaum’s premises cannot be reformulated as claims about what sampling distribution should be used for inference while retaining the soundness of his proof.  It does not follow that Birnbaum’s proof is unsound because Birnbaum’s original premises are not claims about what sampling distribution should be used for inference but instead as sufficient conditions for experimental outcomes to be evidentially equivalent.

Mayo acknowledges that the premises she uses in her argument against Birnbaum’s proof differ from Birnbaum’s original premises in a recent blog post in which she distinguishes between “the Sufficient Principle (general)” and “the Sufficiency Principle applied in sampling theory.“  One could make a similar distinction for the Weak Conditionality Principle.  There is indeed no way to formulate Sufficiency and Weak Conditionality Principles “applied in sampling theory” that are consistent and imply the Likelihood Principle.  This fact is not surprising: sampling theory is incompatible with the Likelihood Principle!

Birnbaum himself insisted that his premises were to be understood as “equivalence relations” rather than as “substitution rules” (i.e., rules about what sampling distribution should be used for inference) and recognized the fact that understanding them in this way was necessary for his proof.  As he put it in his 1975 rejoinder to Kalbfleisch’s response to his proof, “It was the adoption of an unqualified equivalence formulation of conditionality, and related concepts, which led, in my 1972 paper, to the monster of the likelihood axiom” (263).

Because Mayo’s argument against Birnbaum’s proof requires reformulating Birnbaum’s premises, it is best understood as an argument not for the claim that Birnbaum’s original proof is invalid, but rather for the claim that Birnbaum’s proof is valid only when formulated in a way that is irrelevant to a sampling theorist.  Reformulating Birnbaum’s premises as claims about what sampling distribution should be used for inference is the only way for a fully committed sampling theorist to understand them.  Any other formulation of those premises is either false or question-begging.

Mayo’s argument makes good sense when understood in this way, but it requires a strong prior commitment to sampling theory. Whether various arguments for sampling theory such as those Mayo gives in Error and the Growth of Experimental Knowledge are sufficient to warrant such a commitment is a topic for another day.  To those who lack such a commitment, Birnbaum’s original premises may seem quite compelling.  Mayo has not refuted the widespread view that those premises do in fact entail the Likelihood Principle.

Mayo has objected to this line of argument by claiming that her reformulations of Birnbaum’s principles are just instantiations of Birnbaum’s principles in the context of frequentist methods. But they cannot be instantiations in a literal sense because they are imperatives, whereas Birnabaum’s original premises are declaratives.  They are instead instructions that a frequentist would have to follow in order to avoid violating Birnbaum’s principles. The fact that one cannot follow them both is only an objection to Birnbaum’s principles on the question-begging assumption that evidential meaning depends on sampling distributions.


Birnbaum’s proof is not wrong but error statisticians don’t need to bother

Christian Hennig
Department of Statistical Science
University College London

I was impressed by Mayo’s arguments in “Error and Inference” when I came across them for the first time. To some extent, I still am. However, I have also seen versions of Birnbaum’s theorem and proof presented in a mathematically sound fashion with which I as a mathematician had no issue.

After having discussed this a bit with Phil Dawid, and having thought and read more on the issue, my conclusion is that
1) Birnbaum’s theorem and proof are correct (apart from small mathematical issues resolved later in the literature), and they are not vacuous (i.e., there are evidence functions that fulfill them without any contradiction in the premises),
2) however, Mayo’s arguments actually do raise an important problem with Birnbaum’s reasoning.

Here is why. Note that Mayo’s arguments are based on the implicit (error statistical) assumption that the sampling distribution of an inference method is relevant. In that case, application of the sufficiency principle to Birnbaum’s mixture distribution enforces the use of the sampling distribution under the mixture distribution as it is, whereas application of the conditionality principle enforces the use of the sampling distribution under the experiment that actually produced the data, which is different in the usual examples. So the problem is not that Birnbaum’s proof is wrong, but that enforcing both principles at the same time in the mixture experiment is in contradiction to the relevance of the sampling distribution (and therefore to error statistical inference). It is a case in which the sufficiency principle suppresses information that is clearly relevant under the conditionality principle. This means that the justification of the sufficiency principle (namely that all relevant information is in the sufficient statistic) breaks down in this case.

Frequentists/error statisticians therefore don’t need to worry about the likelihood principle because they shouldn’t accept the sufficiency principle in the generality that is required for Birnbaum’s proof.

Having understood this, I toyed around with the idea of writing this down as a publishable paper, but I now came across a paper in which this argument can already be found (although in a less straightforward and more mathematical manner), namely:
M. J. Evans, D. A. S. Fraser and G. Monette (1986) On Principles and Arguments to Likelihood. Canadian Journal of Statistics 14, 181-194,, particularly Section 7 (the rest is interesting, too).

NOTE: This is the last of this group of U-Phils. Mayo will issue a brief response tomorrow. Background to these U-Phils may be found here.

Categories: Philosophy of Statistics, Statistics, U-Phil | Tags: , , , ,

U-PHIL: Hennig and Gelman on Wasserman (2011)

Two further contributions in relation to

Low Assumptions, High Dimensions” (2011)

Please also see : “Deconstructing Larry Wasserman” by Mayo, and Comments by Spanos

Christian Hennig:  Some comments on Larry Wasserman, “Low Assumptions, High Dimensions”

I enjoyed reading this stimulating paper. These are very important issues indeed. I’ll comment on both main concepts in the text.

1) Low Assumptions. I think that the term “assumption” is routinely misused and misunderstood in statistics. In Wasserman’s paper I can’t see such misuse explicitly, but I think that the “message” of the paper may be easily misunderstood because Wasserman doesn’t do much to stop people from this kind of misunderstanding.

Here is what I mean. The arithmetic mean can be derived as optimal estimator under an i.i.d. Gaussian model, which is often interpreted as “model assumption” behind it. However, we don’t really need the Gaussian distribution to be true for the mean to do a good job. Sometimes the mean will do a bad job in a non-Gaussian situation (for example in presence of gross outliers), but sometimes not. The median has nice robustness properties and is seen as admissible for ordinal data. It is therefore usually associated with “weaker assumptions”. However, the median may be worse than the mean in a situation where the Gaussian “assumption” of the mean is grossly violated. At UCL we ask students on a -2/-1/0/1/2 Likert scale for their general opinion about our courses. The distributions that we get here are strongly discrete and the scale is usually interpreted as of ordinal type. Still, for ranking courses, the median is fairly useless (pretty much all courses end up with a median of 0 or 1); whereas, the arithmetic mean can still detect statistically significant meaningful differences between courses.

Why? Because it’s not only the “official” model assumptions that matter but also whether a statistic uses all the data in an appropriate manner for the given application. Here it’s fatal that the median ignores all differences among observations north and south of it. Continue reading

Categories: Philosophy of Statistics, Statistics, U-Phil | Tags: , , , ,

What’s in a Name? (Gelman’s blog)

I just noticed Andrew Gelman’s blog today. ..too good to let pass without quick comment: He asks:

What is a Bayesian?

Deborah Mayo recommended that I consider coming up with a new name for the statistical methods that I used, given that the term “Bayesian” has all sorts of associations that I dislike (as discussed, for example, in section 1 of this article).

I replied that I agree on Bayesian, I never liked the term and always wanted something better, but I couldn’t think of any convenient alternative. Also, I was finding that Bayesians (even the Bayesians I disagreed with) were reading my research articles, while non-Bayesians were simply ignoring them. So I thought it was best to identify with, and communicate with, those people who were willing to engage with me.

More formally, I’m happy defining “Bayesian” as “using inference from the posterior distribution, p(theta|y)”. This says nothing about where the probability distributions come from (thus, no requirement to be “subjective” or “objective”) and it says nothing about the models (thus, no requirement to use the discrete models that have been favored by the Bayesian model selection crew). Based on my minimal definition, I’m as Bayesian as anyone else.

He may be “as Bayesian as anyone else,” but does he really want to be as Bayesian as anyone? (slight, deliberate equivocation). As a good Popperian, I concur (with Popper), that names should not matter, but Gelman’s remarks suggest he should distinguish himself, at least philosophically[i].

As in note [iv] of my Wasserman deconstruction: “Even where Bayesian methods are usefully applied, some say ‘most of the standard philosophy of Bayes is wrong’ (Gelman and Shalizi 2012, 2 n2)”.

In the paper Gelman today cites (from our RMM collection):

… we see science—and applied statistics—as resolving anomalies via the creation of improved models which of- ten include their predecessors as special cases. This view corresponds closely to the error-statistics idea of Mayo (1996). (Gelman 2011, 70)

If the foundations for these methods are error statistical, then shouldn’t that come out in the description? (error-statistical Bayes?) It seems sufficiently novel to warrant some greater gesture, than ‘this too is Bayesian’.)

In that spirit I ended my deconstruction with the passage:

Ironically many seem prepared to allow that Bayesianism still gets it right for epistemology, even as statistical practice calls for methods more closely aligned with frequentist principles. What I would like the reader to consider is that what is right for epistemology is also what is right for statistical learning in practice. That is, statistical inference in practice deserves its own epistemology. (Mayo,  2011p. 100)

What do people think?

[i] To Gelman’s credit, he is one of the few contemporary statisticians to (openly) recognize the potential value of philosophy of statistics for statistical practice!

Categories: Statistics | Tags: , , ,

The Error Statistical Philosophy and The Practice of Bayesian Statistics: Comments on Gelman and Shalizi

Mayo elbowThe following is my commentary on a paper by Gelman and Shalizi, forthcoming (some time in 2013) in the British Journal of Mathematical and Statistical Psychology* (submitted February 14, 2012).

The Error Statistical Philosophy and the Practice of Bayesian Statistics: Comments on A. Gelman and C. Shalizi: Philosophy and the Practice of Bayesian Statistics**
Deborah G. Mayo

  1. Introduction

I am pleased to have the opportunity to comment on this interesting and provocative paper. I shall begin by citing three points at which the authors happily depart from existing work on statistical foundations.

First, there is the authors’ recognition that methodology is ineluctably bound up with philosophy. If nothing else “strictures derived from philosophy can inhibit research progress” (p. 4). They note, for example, the reluctance of some Bayesians to test their models because of their belief that “Bayesian models were by definition subjective,” or perhaps because checking involves non-Bayesian methods (4, n4).

Second, they recognize that Bayesian methods need a new foundation. Although the subjective Bayesian philosophy, “strongly influenced by Savage (1954), is widespread and influential in the philosophy of science (especially in the form of Bayesian confirmation theory),”and while many practitioners perceive the “rising use of Bayesian methods in applied statistical work,” (2) as supporting this Bayesian philosophy, the authors flatly declare that “most of the standard philosophy of Bayes is wrong” (2 n2). Despite their qualification that “a statistical method can be useful even if its philosophical justification is in error”, their stance will rightly challenge many a Bayesian.

Continue reading

Categories: Statistics | Tags: , , , ,

Betting, Bookies and Bayes: Does it Not Matter?

On Gelman’s blog today he offers a simple rejection of Dutch Book arguments for Bayesian inference:

“I have never found this argument appealing, because a bet is a game not a decision. A bet requires 2 players, and one player has to offer the bets.”

But what about dynamic Bayesian Dutch book arguments which are thought to be the basis for advocating updating by Bayes’s theorem?  Betting scenarios, even if hypothetical, are often offered as the basis for making Bayesian measurements operational, and for claiming Bayes’s rule is a warranted representation of updating “uncertainty”. The question I had asked in an earlier (April 15) post (and then placed on hold) is: Does it not matter that Bayesians increasingly seem to debunk  betting representations?

Categories: Statistics | Tags: ,


Gelman responds on his blog today: “Gelman on Hennig on Gelman on Bayes”.

I invite comments here….

*An ongoing exchange among a group of blogs that remain distinct (just coined)

Categories: Philosophy of Statistics, Statistics, U-Phil | Tags: , ,

U-PHIL: A Further Comment on Gelman by Christian Hennig (UCL, Statistics)

Comment on Gelman’sInduction and Deduction in Bayesian Data Analysis” (RMM)

Dr. Christian Hennig (Senior Lecturer, Department of Statistical Science, University College London)

I have read quite a bit of what Andrew Gelman has written in recent years, including some of his blog. One thing that I find particularly refreshing and important about his approach is that he contrasts the Bayesian and frequentist philosophical conceptions honestly with what happens in the practice of data analysis, which often cannot (or does better not to) proceed according to an inflexible dogmatic book of rules.

I also like the emphasis on the fact that all models are wrong. I personally believe that a good philosophy of statistics should consistently take into account that models are rather tools for thinking than able to “match” reality, and in the vast majority of cases we know clearly that they are wrong (all continuous models are wrong because all observed data are discrete, for a start).

There is, however, one issue on which I find his approach unsatisfactory (or at least not well enough explained), and on which both frequentism and subjective Bayesianism seem superior to me.

Continue reading

Categories: Philosophy of Statistics, Statistics, U-Phil | Tags: , , ,

Mayo, Senn, and Wasserman on Gelman’s RMM** Contribution

Picking up the pieces…

Continuing with our discussion of contributions to the special topic,  Statistical Science and Philosophy of Science in Rationality, Markets and Morals (RMM),* I am pleased to post some comments on Andrew **Gelman’s paper “Induction and Deduction in Bayesian Data Analysis”.  (More comments to follow—as always, feel free to comment.)

Note: March 9, 2012: Gelman has commented to some of our comments on his blog today:

D. Mayo

For now, I will limit my own comments to two: First, a fairly uncontroversial point, while Gelman writes that “Popper has argued (convincingly, in my opinion) that scientific inference is not inductive but deductive,” a main point of my series (Part 123) of “No-Pain” philosophy was that “deductive” falsification involves inductively inferring a “falsifying hypothesis”.

More importantly, and more challengingly, Gelman claims the view he recommends “corresponds closely to the error-statistics idea of Mayo (1996)”.  Now the idea that non-Bayesian ideas might afford a foundation for strands of Bayesianism is not as implausible as it may seem. On the face of it, any inference to a claim, whether to the adequacy of a model (for a given purpose), or even to a posterior probability, can be said to be warranted just to the extent that the claim has withstood a severe test (i.e, a test that would, at least with reasonable probability, have discerned a flaw with the claim, were it false).  The question is: How well do Gelman’s methods for inferring statistical models satisfy severity criteria?  (I’m not sufficiently familiar with his intended applications to say.)

Continue reading

Categories: Philosophy of Statistics, Statistics, U-Phil | Tags: , , ,

Senn Again (Gelman)

Senn will be glad to see that we haven’t forgotten him!  (see this blog Jan. 14, Jan. 15,  Jan. 23, and 24, 2012).  He’s back on Gelman’s blog today .

I hope to hear some reflections this time around on the issue often noted but not discussed: updating and down dating (see this blog, Jan. 26, 2012).

Categories: Philosophy of Statistics, Statistics | Tags: ,

Updating & Downdating: One of the Pieces to Pick up on

pieces to pick up on (later)

Before moving on to a couple of rather different areas, there’s an issue that, while mentioned by both Senn and Gelman, did not come up for discussion; so let me just note it here as one of the pieces to pick up on later.

“It is hard to see what exactly a Bayesian statistician is doing when interacting with a client. There is an initial period in which the subjective beliefs of the client are established. These prior probabilities are taken to be valuable enough to be incorporated in subsequent calculation. However, in subsequent steps the client is not trusted to reason. The reasoning is carried out by the statistician. As an exercise in mathematics it is not superior to showing the client the data, eliciting a posterior distribution and then calculating the prior distribution; as an exercise in inference Bayesian updating does not appear to have greater claims than ‘downdating’ and indeed sometimes this point is made by Bayesians when discussing what their theory implies. (59)…..” Stephen Senn

“As I wrote in 2008, if you could really construct a subjective prior you believe in, why not just look at the data and write down your subjective posterior.” Andrew Gelman commenting on Senn

I’ve even heard subjective Bayesians concur on essentially this identical point, but I would think that many would take issue with it…no?  

Categories: Statistics | Tags: , , ,

Blog at