Posts Tagged With: Christian Hennig

U-PHIL: Gandenberger & Hennig: Blogging Birnbaum’s Proof

greg picDefending Birnbaum’s Proof

Greg Gandenberger
PhD student, History and Philosophy of Science
Master’s student, Statistics
University of Pittsburgh

In her 1996 Error and the Growth of Experimental Knowledge, Professor Mayo argued against the Likelihood Principle on the grounds that it does not allow one to control long-run error rates in the way that frequentist methods do.  This argument seems to me the kind of response a frequentist should give to Birnbaum’s proof.  It does not require arguing that Birnbaum’s proof is unsound: a frequentist can accommodate Birnbaum’s conclusion (two experimental outcomes are evidentially equivalent if they have the same likelihood function) by claiming that respecting evidential equivalence is less important than achieving certain goals for which frequentist methods are well suited.

More recently, Mayo has shown that Birnbaum’s premises cannot be reformulated as claims about what sampling distribution should be used for inference while retaining the soundness of his proof.  It does not follow that Birnbaum’s proof is unsound because Birnbaum’s original premises are not claims about what sampling distribution should be used for inference but instead as sufficient conditions for experimental outcomes to be evidentially equivalent.

Mayo acknowledges that the premises she uses in her argument against Birnbaum’s proof differ from Birnbaum’s original premises in a recent blog post in which she distinguishes between “the Sufficient Principle (general)” and “the Sufficiency Principle applied in sampling theory.“  One could make a similar distinction for the Weak Conditionality Principle.  There is indeed no way to formulate Sufficiency and Weak Conditionality Principles “applied in sampling theory” that are consistent and imply the Likelihood Principle.  This fact is not surprising: sampling theory is incompatible with the Likelihood Principle!

Birnbaum himself insisted that his premises were to be understood as “equivalence relations” rather than as “substitution rules” (i.e., rules about what sampling distribution should be used for inference) and recognized the fact that understanding them in this way was necessary for his proof.  As he put it in his 1975 rejoinder to Kalbfleisch’s response to his proof, “It was the adoption of an unqualified equivalence formulation of conditionality, and related concepts, which led, in my 1972 paper, to the monster of the likelihood axiom” (263).

Because Mayo’s argument against Birnbaum’s proof requires reformulating Birnbaum’s premises, it is best understood as an argument not for the claim that Birnbaum’s original proof is invalid, but rather for the claim that Birnbaum’s proof is valid only when formulated in a way that is irrelevant to a sampling theorist.  Reformulating Birnbaum’s premises as claims about what sampling distribution should be used for inference is the only way for a fully committed sampling theorist to understand them.  Any other formulation of those premises is either false or question-begging.

Mayo’s argument makes good sense when understood in this way, but it requires a strong prior commitment to sampling theory. Whether various arguments for sampling theory such as those Mayo gives in Error and the Growth of Experimental Knowledge are sufficient to warrant such a commitment is a topic for another day.  To those who lack such a commitment, Birnbaum’s original premises may seem quite compelling.  Mayo has not refuted the widespread view that those premises do in fact entail the Likelihood Principle.

Mayo has objected to this line of argument by claiming that her reformulations of Birnbaum’s principles are just instantiations of Birnbaum’s principles in the context of frequentist methods. But they cannot be instantiations in a literal sense because they are imperatives, whereas Birnabaum’s original premises are declaratives.  They are instead instructions that a frequentist would have to follow in order to avoid violating Birnbaum’s principles. The fact that one cannot follow them both is only an objection to Birnbaum’s principles on the question-begging assumption that evidential meaning depends on sampling distributions.

 ********

Birnbaum’s proof is not wrong but error statisticians don’t need to bother

Christian Hennig
Department of Statistical Science
University College London

I was impressed by Mayo’s arguments in “Error and Inference” when I came across them for the first time. To some extent, I still am. However, I have also seen versions of Birnbaum’s theorem and proof presented in a mathematically sound fashion with which I as a mathematician had no issue.

After having discussed this a bit with Phil Dawid, and having thought and read more on the issue, my conclusion is that
1) Birnbaum’s theorem and proof are correct (apart from small mathematical issues resolved later in the literature), and they are not vacuous (i.e., there are evidence functions that fulfill them without any contradiction in the premises),
2) however, Mayo’s arguments actually do raise an important problem with Birnbaum’s reasoning.

Here is why. Note that Mayo’s arguments are based on the implicit (error statistical) assumption that the sampling distribution of an inference method is relevant. In that case, application of the sufficiency principle to Birnbaum’s mixture distribution enforces the use of the sampling distribution under the mixture distribution as it is, whereas application of the conditionality principle enforces the use of the sampling distribution under the experiment that actually produced the data, which is different in the usual examples. So the problem is not that Birnbaum’s proof is wrong, but that enforcing both principles at the same time in the mixture experiment is in contradiction to the relevance of the sampling distribution (and therefore to error statistical inference). It is a case in which the sufficiency principle suppresses information that is clearly relevant under the conditionality principle. This means that the justification of the sufficiency principle (namely that all relevant information is in the sufficient statistic) breaks down in this case.

Frequentists/error statisticians therefore don’t need to worry about the likelihood principle because they shouldn’t accept the sufficiency principle in the generality that is required for Birnbaum’s proof.

Having understood this, I toyed around with the idea of writing this down as a publishable paper, but I now came across a paper in which this argument can already be found (although in a less straightforward and more mathematical manner), namely:
M. J. Evans, D. A. S. Fraser and G. Monette (1986) On Principles and Arguments to Likelihood. Canadian Journal of Statistics 14, 181-194, http://www.jstor.org/stable/3314794, particularly Section 7 (the rest is interesting, too).

NOTE: This is the last of this group of U-Phils. Mayo will issue a brief response tomorrow. Background to these U-Phils may be found here.

Categories: Philosophy of Statistics, Statistics, U-Phil | Tags: , , , , | 12 Comments

Mayo Responds to U-Phils on Background Information

Thanks to Emrah Aktunc and Christian Hennig for their U-Phils on my September 12 post: “How should ‘prior information’ enter in statistical inference?” and my subsequent deconstruction of Gelman[i] (starting here, and ending with part 3).  I’ll begin with some remarks on Emrah Aktunc’s contribution.

First, we need to avoid an ambiguity that clouds prior information and prior probability. In a given experiment, prior information may be stronger than the data: to take but one example, say that we’ve already falsified Newton’s theory of gravity in several domains, but in our experiment the data (e.g., one of the sets of eclipse data from 1919) accords with the Newtonian prediction (of half the amount of deflection as that predicted by Einstein’s general theory of relativity [GTR]). The pro-Newton data, in and of itself, would be rejected because of all that we already know. Continue reading

Categories: Background knowledge, Error Statistics, Philosophy of Statistics, Statistics, U-Phil | Tags: , , | 4 Comments

U-PHIL: Wasserman Replies to Spanos and Hennig

Wasserman on Spanos and Hennig on  “Low Assumptions, High Dimensions” (2011)

(originating U-PHIL : “Deconstructing Larry Wasserman” by Mayo )

________

Thanks to Aris and others for comments .

Response to Aris Spanos:

1. You don’t prefer methods based on weak assumptions? Really? I suspect Aris is trying to be provocative. Yes such inferences can be less precise. Good. Accuracy is an illusion if it comes from assumptions, not from data.

2. I do not think I was promoting inferences based on “asymptotic grounds.” If I did, that was not my intent. I want finite sample, distribution free methods. As an example, consider the usual finite sample (order statistics based) confidence interval for the median. No regularity assumptions, no asymptotics, no approximations. What is there to object to?

3. Indeed, I do have to make some assumptions. For simplicity, and because it is often reasonable, I assumed iid in the paper (as I will here). Other than that, where am I making any untestable assumptions in the example of the median?

4. I gave a very terse and incomplete summary of Davies’ work. I urge readers to look at Davies’ papers; my summary does not do the work justice. He certainly did not advocate eyeballing the data. Continue reading

Categories: Philosophy of Statistics, Statistics, U-Phil | Tags: , , , , | 3 Comments

U-PHIL: Hennig and Gelman on Wasserman (2011)

Two further contributions in relation to

“Low Assumptions, High Dimensions” (2011)

Please also see : “Deconstructing Larry Wasserman” by Mayo, and Comments by Spanos

Christian Hennig:  Some comments on Larry Wasserman, “Low Assumptions, High Dimensions”

I enjoyed reading this stimulating paper. These are very important issues indeed. I’ll comment on both main concepts in the text.

1) Low Assumptions. I think that the term “assumption” is routinely misused and misunderstood in statistics. In Wasserman’s paper I can’t see such misuse explicitly, but I think that the “message” of the paper may be easily misunderstood because Wasserman doesn’t do much to stop people from this kind of misunderstanding.

Here is what I mean. The arithmetic mean can be derived as optimal estimator under an i.i.d. Gaussian model, which is often interpreted as “model assumption” behind it. However, we don’t really need the Gaussian distribution to be true for the mean to do a good job. Sometimes the mean will do a bad job in a non-Gaussian situation (for example in presence of gross outliers), but sometimes not. The median has nice robustness properties and is seen as admissible for ordinal data. It is therefore usually associated with “weaker assumptions”. However, the median may be worse than the mean in a situation where the Gaussian “assumption” of the mean is grossly violated. At UCL we ask students on a -2/-1/0/1/2 Likert scale for their general opinion about our courses. The distributions that we get here are strongly discrete and the scale is usually interpreted as of ordinal type. Still, for ranking courses, the median is fairly useless (pretty much all courses end up with a median of 0 or 1); whereas, the arithmetic mean can still detect statistically significant meaningful differences between courses.

Why? Because it’s not only the “official” model assumptions that matter but also whether a statistic uses all the data in an appropriate manner for the given application. Here it’s fatal that the median ignores all differences among observations north and south of it. Continue reading

Categories: Philosophy of Statistics, Statistics, U-Phil | Tags: , , , , | 3 Comments

Review of Error and Inference by C. Hennig

Theoria just sent me this review by Hennig* of Error and Inference.
in THEORIA 74 (2012): 245-247,

(Open access)

Deborah G. Mayo and Aris Spanos, eds. 2009. Error and Inference. Cambridge: Cambridge University Press.

Error and Inference focuses on the error-statistical philosophy of science (ESP) put forward by Deborah Mayo and Aris Spanos (MS). Chapters 1, 6 and 7 are mainly written by MS (partly with the statistician David Cox), whereas Chapters 2-5, 8, and 9 are driven by the contributions of other authors. There are responses to all these contributions at the end of the chapters, usually written by Mayo.

The structure of the book with the responses at the end of each chapter is a striking feature. The critical contributions enable a very lively discussion of ESP. On the other hand always having the last word puts Mayo and Spanos in a quite advantageous position. Some of the contributors may have underestimated Mayo’s ability to make the most of this advantage.

Central to ESP are the issues of probing scientific theories objectively by data, and Mayo’s concept of “severe testing” (ST). ST is based on a frequentist interpretation of probability, on conventional hypothesis testing and the associated error probabilities. ESP advertises a “piecemeal” approach to testing a scientific theory, in which various different aspects, which can be used to make predictions about data, are subjected to hypothesis tests. A statistical problem with such an approach is that failure of rejection of a null hypothesis H0 does not necessarily constitute evidence in favour of H0. The space of probability models is so rich that it is impossible to rule out all other probability models.

Continue reading

Categories: philosophy of science, Statistics | Tags: , | Leave a comment

Blogologue*

Gelman responds on his blog today: “Gelman on Hennig on Gelman on Bayes”.

http://andrewgelman.com/2012/03/gelman-on-hennig-on-gelman-on-bayes/

I invite comments here….

*An ongoing exchange among a group of blogs that remain distinct (just coined)

Categories: Philosophy of Statistics, Statistics, U-Phil | Tags: , , | Leave a comment

U-PHIL: A Further Comment on Gelman by Christian Hennig (UCL, Statistics)

Comment on Gelman’s “Induction and Deduction in Bayesian Data Analysis” (RMM)

Dr. Christian Hennig (Senior Lecturer, Department of Statistical Science, University College London)

I have read quite a bit of what Andrew Gelman has written in recent years, including some of his blog. One thing that I find particularly refreshing and important about his approach is that he contrasts the Bayesian and frequentist philosophical conceptions honestly with what happens in the practice of data analysis, which often cannot (or does better not to) proceed according to an inflexible dogmatic book of rules.

I also like the emphasis on the fact that all models are wrong. I personally believe that a good philosophy of statistics should consistently take into account that models are rather tools for thinking than able to “match” reality, and in the vast majority of cases we know clearly that they are wrong (all continuous models are wrong because all observed data are discrete, for a start).

There is, however, one issue on which I find his approach unsatisfactory (or at least not well enough explained), and on which both frequentism and subjective Bayesianism seem superior to me.

Continue reading

Categories: Philosophy of Statistics, Statistics, U-Phil | Tags: , , , | 5 Comments

Blog at WordPress.com.