U-Phil

U-PHIL: Gandenberger & Hennig: Blogging Birnbaum’s Proof

greg picDefending Birnbaum’s Proof

Greg Gandenberger
PhD student, History and Philosophy of Science
Master’s student, Statistics
University of Pittsburgh

In her 1996 Error and the Growth of Experimental Knowledge, Professor Mayo argued against the Likelihood Principle on the grounds that it does not allow one to control long-run error rates in the way that frequentist methods do.  This argument seems to me the kind of response a frequentist should give to Birnbaum’s proof.  It does not require arguing that Birnbaum’s proof is unsound: a frequentist can accommodate Birnbaum’s conclusion (two experimental outcomes are evidentially equivalent if they have the same likelihood function) by claiming that respecting evidential equivalence is less important than achieving certain goals for which frequentist methods are well suited.

More recently, Mayo has shown that Birnbaum’s premises cannot be reformulated as claims about what sampling distribution should be used for inference while retaining the soundness of his proof.  It does not follow that Birnbaum’s proof is unsound because Birnbaum’s original premises are not claims about what sampling distribution should be used for inference but instead as sufficient conditions for experimental outcomes to be evidentially equivalent.

Mayo acknowledges that the premises she uses in her argument against Birnbaum’s proof differ from Birnbaum’s original premises in a recent blog post in which she distinguishes between “the Sufficient Principle (general)” and “the Sufficiency Principle applied in sampling theory.“  One could make a similar distinction for the Weak Conditionality Principle.  There is indeed no way to formulate Sufficiency and Weak Conditionality Principles “applied in sampling theory” that are consistent and imply the Likelihood Principle.  This fact is not surprising: sampling theory is incompatible with the Likelihood Principle!

Birnbaum himself insisted that his premises were to be understood as “equivalence relations” rather than as “substitution rules” (i.e., rules about what sampling distribution should be used for inference) and recognized the fact that understanding them in this way was necessary for his proof.  As he put it in his 1975 rejoinder to Kalbfleisch’s response to his proof, “It was the adoption of an unqualified equivalence formulation of conditionality, and related concepts, which led, in my 1972 paper, to the monster of the likelihood axiom” (263).

Because Mayo’s argument against Birnbaum’s proof requires reformulating Birnbaum’s premises, it is best understood as an argument not for the claim that Birnbaum’s original proof is invalid, but rather for the claim that Birnbaum’s proof is valid only when formulated in a way that is irrelevant to a sampling theorist.  Reformulating Birnbaum’s premises as claims about what sampling distribution should be used for inference is the only way for a fully committed sampling theorist to understand them.  Any other formulation of those premises is either false or question-begging.

Mayo’s argument makes good sense when understood in this way, but it requires a strong prior commitment to sampling theory. Whether various arguments for sampling theory such as those Mayo gives in Error and the Growth of Experimental Knowledge are sufficient to warrant such a commitment is a topic for another day.  To those who lack such a commitment, Birnbaum’s original premises may seem quite compelling.  Mayo has not refuted the widespread view that those premises do in fact entail the Likelihood Principle.

Mayo has objected to this line of argument by claiming that her reformulations of Birnbaum’s principles are just instantiations of Birnbaum’s principles in the context of frequentist methods. But they cannot be instantiations in a literal sense because they are imperatives, whereas Birnabaum’s original premises are declaratives.  They are instead instructions that a frequentist would have to follow in order to avoid violating Birnbaum’s principles. The fact that one cannot follow them both is only an objection to Birnbaum’s principles on the question-begging assumption that evidential meaning depends on sampling distributions.

 ********

Birnbaum’s proof is not wrong but error statisticians don’t need to bother

Christian Hennig
Department of Statistical Science
University College London

I was impressed by Mayo’s arguments in “Error and Inference” when I came across them for the first time. To some extent, I still am. However, I have also seen versions of Birnbaum’s theorem and proof presented in a mathematically sound fashion with which I as a mathematician had no issue.

After having discussed this a bit with Phil Dawid, and having thought and read more on the issue, my conclusion is that
1) Birnbaum’s theorem and proof are correct (apart from small mathematical issues resolved later in the literature), and they are not vacuous (i.e., there are evidence functions that fulfill them without any contradiction in the premises),
2) however, Mayo’s arguments actually do raise an important problem with Birnbaum’s reasoning.

Here is why. Note that Mayo’s arguments are based on the implicit (error statistical) assumption that the sampling distribution of an inference method is relevant. In that case, application of the sufficiency principle to Birnbaum’s mixture distribution enforces the use of the sampling distribution under the mixture distribution as it is, whereas application of the conditionality principle enforces the use of the sampling distribution under the experiment that actually produced the data, which is different in the usual examples. So the problem is not that Birnbaum’s proof is wrong, but that enforcing both principles at the same time in the mixture experiment is in contradiction to the relevance of the sampling distribution (and therefore to error statistical inference). It is a case in which the sufficiency principle suppresses information that is clearly relevant under the conditionality principle. This means that the justification of the sufficiency principle (namely that all relevant information is in the sufficient statistic) breaks down in this case.

Frequentists/error statisticians therefore don’t need to worry about the likelihood principle because they shouldn’t accept the sufficiency principle in the generality that is required for Birnbaum’s proof.

Having understood this, I toyed around with the idea of writing this down as a publishable paper, but I now came across a paper in which this argument can already be found (although in a less straightforward and more mathematical manner), namely:
M. J. Evans, D. A. S. Fraser and G. Monette (1986) On Principles and Arguments to Likelihood. Canadian Journal of Statistics 14, 181-194, http://www.jstor.org/stable/3314794, particularly Section 7 (the rest is interesting, too).

NOTE: This is the last of this group of U-Phils. Mayo will issue a brief response tomorrow. Background to these U-Phils may be found here.

Categories: Philosophy of Statistics, Statistics, U-Phil | Tags: , , , , | 12 Comments

Mark Chang (now) gets it right about circularity

metablog old fashion typewriterMark Chang wrote a comment this evening, but it is buried back on my Nov. 31 post in relation to the current U-Phil. Given all he has written on my attempt to “break through the breakthrough”, I thought to bring it up to the top. Chang ends off his comment with the sagacious, and entirely correct claim that so many people have missed:

“What Birnbaum actually did was use the SLP to prove the SLP – as simple as that!” (Mark Chang)

It is just too bad that readers of his (2013) book will not have been told this*!  Mark: Can you issue a correction?  I definitely think you should!  If only you’d written to me, I could have pointed this out pre-pub.

That Birnbaum’s argument assumes what it claims to prove is just what I have been arguing all along. It is called a begging-the-question fallacy: An argument that boils down to:

A/therefore A

Such an argument is logically valid, and that is why formal validity does not mean much for getting conclusions accepted. Why? Well, even though such circular arguments are usually dressed up so that the premises do not so obviously repeat the conclusion, they are similarly fallacious: the truth of the premises already assumes the truth of the conclusion. If we are allowed to argue that way, you can argue anything you like! To not-A as well. That is not what the Great “Breakthrough” was supposed to be doing.

Chang’s comment (which is the same one he posted on Xi’an’s og here) also includes his other points, but fortunately, Jean Miller has recently gone through those in depth. In neither of my (generous) construals of Birnbaum do I claim his premises are inconsistent, by the way.

*But instead his readers are led to believe my criticism is flawed because of something about sufficiency having to do with a FAMILY of distributions (his caps on “family”, p. 138). This all came up as well in Xi”an’s og.

Chang, M. (2013) Paradoxes in Scientific Inference.

 

Categories: strong likelihood principle, U-Phil | 2 Comments

U-Phil: Ton o’ Bricks

ton_of_bricksby Deborah Mayo

Birnbaum’s argument for the SLP involves some equivocations that are at once subtle and blatant. The subtlety makes it hard to translate into symbolic logic (I only partially translated it). Philosophers should have a field day with this, and I should be hearing more reports that it has suddenly hit them between the eyes like a ton of bricks, to use a mixture metaphor. Here are the key bricks. References can be found in here, background to the U-Phil here..

Famous (mixture) weighing machine example and the WLP 

The main principle of evidence on which Birnbaum’s argument rests is the weak conditionality principle (WCP).  This principle, Birnbaum notes, follows not from mathematics alone but from intuitively plausible views of “evidential meaning.” To understand the interpretation of the WCP that gives it its plausible ring, we consider its development in “what is now usually called the ‘weighing machine example,’ which draws attention to the need for conditioning, at least in certain types of problems” (Reid 1992).

The basis for the WCP 

Example 3. Two measuring instruments of different precisions. We flip a fair coin to decide which of two instruments, E’ or E”, to use in observing a normally distributed random sample X to make inferences about mean q. Ehas a known variance of 10−4, while that of E” is known to be 104. The experiment is a mixture: E-mix. The fair coin or other randomizer may be characterized as observing an indicator statistic J, taking values 1 or 2 with probabilities .5, independent of the process under investigation. The full data indicates first the result of the coin toss, and then the measurement: (Ej, xj).[i]

The sample space of E-mix with components Ej, j = 1, 2, consists of the union of

{(j, x’): j = 0, possible values of X’} and {(j, x”): j = 1, possible values of X”}.

In testing a null hypothesis such as q = 0, the same x measurement would correspond to a much smaller p-value were it to have come from E′ than if it had come from E”: denote them as p′(x) and p′′(x), respectively. However, the overall significance level of the mixture, the convex combination of the p-value: [p′(x) + p′′(x)]/2, would give a misleading report of the precision or severity of the actual experimental measurement (See Cox and Mayo 2010, 296).

Suppose that we know we have observed a measurement from E” with its much larger variance:

The unconditional test says that we can assign this a higher level of significance than we ordinarily do, because if we were to repeat the experiment, we might sample some quite different distribution. But this fact seems irrelevant to the interpretation of an observation which we know came from a distribution [with the larger variance] (Cox 1958, 361).

In effect, an individual unlucky enough to use the imprecise tool gains a more informative assessment because he might have been lucky enough to use the more precise tool! (Birnbaum 1962, 491; Cox and Mayo 2010, 296). Once it is known whether E′ or E′′ has produced x, the p-value or other inferential assessment should be made conditional on the experiment actually run.

Weak Conditionality Principle (WCP): If a mixture experiment is performed, with components E’, E” determined by a randomizer (independent of the parameter of interest), then once (E’, x’) is known, inference should be based on E’ and its sampling distribution, not on the sampling distribution of the convex combination of E’ and E”.

Understanding the WCP

The WCP includes a prescription and a proscription for the proper evidential interpretation of x’, once it is known to have come from E’:

The evidential meaning of any outcome (E’, x’) of any experiment E having a mixture structure is the same as: the evidential meaning of the corresponding outcome x’ of the corresponding component experiment E’, ignoring otherwise the over-all structure of the original experiment E (Birnbaum 1962, 489 Eh and xh replaced with E’ and x’ for consistency).

While the WCP seems obvious enough, it is actually rife with equivocal potential. To avoid this, we spell out its three assertions.

First, it applies once we know which component of the mixture has been observed, and what the outcome was (Ej xj). (Birnbaum considers mixtures with just two components).

Second, there is the prescription about evidential equivalence. Once it is known that Ej has generated the data, given that our inference is about a parameter of Ej, inferences are appropriately drawn in terms of the distribution in Ej —the experiment known to have been performed.

Third, there is the proscription. In the case of informative inferences about the parameter of Ej our inference should not be influenced by whether the decision to perform Ej was determined by a coin flip or fixed all along. Misleading informative inferences might result from averaging over the convex combination of Ej and an experiment known not to have given rise to the data. The latter may be called the unconditional (sampling) distribution. ….

______________________________________________

One crucial equivocation:

 Casella and R. Berger (2002) write:

The [weak] Conditionality principle simply says that if one of two experiments is randomly chosen and the chosen experiment is done, yielding data x, the information about q depends only on the experiment performed. . . . The fact that this experiment was performed, rather than some other, has not increased, decreased, or changed knowledge of q. (p. 293, emphasis added)

I have emphasized the last line in order to underscore a possible equivocation. Casella and Berger’s intended meaning is the correct claim:

(i) Given that it is known that measurement x’ is observed as a result of using tool E’, then it does not matter (and it need not be reported) whether or not E’ was chosen by a random toss (that might have resulted in using tool E”) or had been fixed all along.

Of course we do not know what measurement would have resulted had the unperformed measuring tool been used.

Compare (i) to a false and unintended reading:

(ii) If some measurement x is observed, then it does not matter (and it need not be reported) whether it came from a precise tool E’ or imprecise tool E”.

The idea of detaching x, and reporting that “x came from somewhere I know not where,” will not do. For one thing, we need to know the experiment in order to compute the sampling inference. For another, E’ and E” may be like our weighing procedures with very different precisions. It is analogous to being given the likelihood of the result in Example 1,(here) withholding whether it came from a negative binomial or a binomial.

Claim (i), by contrast, may well be warranted, not on purely mathematical grounds, but as the most appropriate way to report the precision of the result attained, as when the WCP applies. The essential difference in claim (i) is that it is known that (E, x’), enabling its inferential import to be determined.

The linguistic similarity of (i) and (ii) may explain the equivocation that vitiates the Birnbaum argument.


Now go back and skim 3 short pages of notes here, pp 11-14, and it should hit you like a ton of bricks!  If so, reward yourself with a double Elba Grease, else try again. Report your results in the comments.

Categories: Birnbaum Brakes, Statistics, strong likelihood principle, U-Phil | 7 Comments

U-Phil: J. A. Miller: Blogging the SLP

Jean Miller

Jean Miller

Jean A. Miller, PhD
Department of Philosophy
Virginia Tech

MIX & MATCH MESS: A NOTE ON A MISLEADING DISCUSSION OF MAYO’S BIRNBAUM PAPER

Mayo in her “rejected” post (12/27/12) briefly points out how Mark Chang, in his book Paradoxes of Scientific Inference (2012, pp. 137-139), took pieces from the two distinct variations she gives of Birnbaum’s arguments, either of which shows the unsoundness of Birnbaum’s purported proof, and illegitimately combines them. He then mistakenly maintains that it is Mayo’s conclusions that are “faulty” rather than Birnbaum’s argument. In this note, I just want to fill in some of the missing pieces of what is going on here, so that others will not be misled. I put together some screen shots so you can read exactly what he wrote pp. 137-139. (See also Mayo’s note to Chang on Xi’an’s blog here.) Read more »

Categories: Statistics, strong likelihood principle, U-Phil | 5 Comments

U-Phil: S. Fletcher & N.Jinn

Samuel Fletcher

“Model Verification and the Likelihood Principle” by Samuel C. Fletcher
Department of Logic & Philosophy of Science (PhD Student)
University of California, Irvine

I’d like to sketch an idea concerning the applicability of the Likelihood Principle (LP) to non-trivial statistical problems.  What I mean by “non-trivial statistical problems” are those involving substantive modeling assumptions, where there could be any doubt that the probability model faithfully represents the mechanism generating the data.  (Understanding exactly how scientific models represent phenomena is subtle and important, but it will not be my focus here.  For more, see http://plato.stanford.edu/entries/models-science/.) In such cases, it is crucial for the modeler to verify, inasmuch as it is possible, the sufficient faithfulness of those assumptions.

But the techniques used to verify these statistical assumptions are themselves statistical. One can then ask: do techniques of model verification fall under the purview of the LP?  That is: are such techniques a part of the inferential procedure constrained by the LP?  I will argue the following:

(1) If they are—what I’ll call the inferential view of model verification—then there will be in general no inferential procedures that satisfy the LP.

(2) If they are not—what I’ll call the non-inferential view—then there are aspects of any evidential evaluation that inferential techniques bound by the LP do not capture. Read more »

Categories: Statistics, strong likelihood principle, U-Phil | 17 Comments

Coming up: December U-Phil Contributions….

Dear Reader: You were probably* wondering about the December U-Phils (blogging the strong likelihood principle (SLP)). They will be posted, singly or in pairs, over the next few blog entries. Here is the initial call, and the extension. The details of the specific U-Phil may be found here, but also look at the post from my 28 Nov. seminar at the London School of Economics (LSE), which was on the SLP. Posts were to be in relation to either the guest graduate student post by Gandenberger, and/or my discussion/argument and reactions to it. Earlier U-Phils may be found here; and more by searching this blog. ”U-Phil” is short for “you ‘philosophize”.

If you have ideas for future “U-Phils,” post them as comments to this blog or send them to error@vt.edu.

*This is how I see “probability” mainly used in ordinary English, namely as expressing something like “here’s a pure guess made without evidence or with little evidence,” be it sarcastic or quite genuine.

 

Categories: Announcement, Likelihood Principle, U-Phil | Leave a comment

Mayo on S. Senn: “How Can We Cultivate Senn’s-Ability?”–reblogs

Since Stephen Senn will be leading our seminar at the LSE tomorrow morning (see PH500 page), I’m reblogging my deconstruction of his paper (“You May Believe You Are a Bayesian But You Probably Are Wrong”) from Jan.15 2012 (though not his main topic tomorrow). At the end I link to other “U-Phils” on Senn’s paper (by Andrew Gelman, Andrew Jaffe, Christian Robert), Senn’s response, and my response to them). Queries, write me at: error@vt.edu

Mayo Philosophizes on Stephen Senn: “How Can We Cultivate Senn’s-Ability?”

Where's Mayo?

Where’s Mayo?

Although, in one sense, Senn’s remarks echo the passage of Jim Berger’s that we deconstructed a few weeks ago, Senn at the same time seems to reach an opposite conclusion. He points out how, in practice, people who claim to have carried out a (subjective) Bayesian analysis have actually done something very different—but that then they heap credit on the Bayesian ideal. (See also “Who Is Doing the Work?”)

“A very standard form of argument I do object to is the one frequently encountered in many applied Bayesian papers where the first paragraphs laud the Bayesian approach on various grounds, in particular its ability to synthesize all sources of information, and in the rest of the paper the authors assume that because they have used the Bayesian machinery of prior distributions and Bayes theorem they have therefore done a good analysis. It is this sort of author who believes that he or she is Bayesian but in practice is wrong.” (Senn 58) Read more »

Categories: Bayesian/frequentist, U-Phil | 11 Comments

Announcement: U-Phil Extension: Blogging the Likelihood Principle

U-Phil: I am extending to Dec. 19, 2012 the date for sending me responses to the “U-Phil” call, see initial call, given some requests for more time. The details of the specific U-Phil may be found here, but you might also look at the post relating to my 28 Nov. seminar at the LSE, which is directly on the topic: the infamous (strong) likelihood principle (SLP). ”U-Phil, ” which is short for “you ‘philosophize’” is really just an opportunity to write something .5-1 notch above an ordinary comment (focussed on one or more specific posts/papers, as described in each call): it can be longer (~500-1000 words), and it appears in the regular blog area rather than as a comment.  Your remarks can relate to the guest graduate student post by Gregory Gandenberger, and/or my discussion/argument. Graduate student posts (e.g., attendees of my 28 Nov. LSE seminar?) are especially welcome*. Earlier explemplars of U-Phils may be found here; and more by searching this blog.

Thanks to everyone who sent me names of vintage typewriter repair shops in London, after the airline damage: the “x” is fixed, but the “z” key is still misbehaving.

*Another post of possible relevance to graduate students comes up when searching this blog for  “sex”.

Categories: Announcement, Likelihood Principle, U-Phil | Leave a comment

Blogging Birnbaum: on Statistical Methods in Scientific Inference

I said I’d make some comments on Birnbaum’s letter (to Nature), (linked in my last post), which is relevant to today’s Seminar session (at the LSE*), as well as to (Normal Deviate‘s) recent discussion of frequentist inference–in terms of constructing procedures with good long-run “coverage”. (Also to the current U-Phil).

NATURE VOL. 225 MARCH 14, 1970 (1033)

LETTERS TO THE EDITOR

Statistical methods in Scientific Inference

 It is regrettable that Edwards’s interesting article[1], supporting the likelihood and prior likelihood concepts, did not point out the specific criticisms of likelihood (and Bayesian) concepts that seem to dissuade most theoretical and applied statisticians from adopting them. As one whom Edwards particularly credits with having ‘analysed in depth…some attractive properties” of the likelihood concept, I must point out that I am not now among the ‘modern exponents” of the likelihood concept. Further, after suggesting that the notion of prior likelihood was plausible as an extension or analogue of the usual likelihood concept (ref.2, p. 200)[2], I have pursued the matter through further consideration and rejection of both the likelihood concept and various proposed formalizations of prior information and opinion (including prior likelihood).  I regret not having expressed my developing views in any formal publication between 1962 and late 1969 (just after ref. 1 appeared). My present views have now, however, been published in an expository but critical article (ref. 3, see also ref. 4)[3] [4], and so my comments here will be restricted to several specific points that Edwards raised. Read more »

Categories: Likelihood Principle, Statistics, U-Phil | 5 Comments

Likelihood Links [for 28 Nov. Seminar and Current U-Phil]

old blogspot typewriterDear Reader: We just arrived in London[i][ii]. Jean Miller has put together some materials for Birnbaum LP aficionados in connection with my 28 November seminar. Great to have ready links to some of the early comments and replies by Birnbaum, Durbin, Kalbfleish and others, possibly of interest to those planning contributions to the current “U-Phil“.  I will try to make some remarks on Birnbaum’s 1970 letter to the editor tomorrow.

November 28th reading

Categories: Birnbaum Brakes, Likelihood Principle, U-Phil | Leave a comment

U-Phil: Blogging the Likelihood Principle: New Summary

U-Phil: I would like to open up this post, together with Gandenberger’s (Oct. 30, 2012), to reader U-Phils, from December 6- 19 (< 1000 words) for posting on this blog (please see # at bottom of post).  Where Gandenberger claims, “Birnbaum’s proof is valid and his premises are intuitively compelling,” I have shown that if Birnbaum’s premises are interpreted so as to be true, the argument is invalid.  If construed as formally valid, I argue, the premises contradict each other. Who is right?  Gandenberger doesn’t wrestle with my critique of Birnbaum, but I invite you (and Greg!) to do so. I’m pasting a new summary of my argument below.

 The main premises may be found on pp. 11-14. While these points are fairly straightforward (and do not require technical statistics), they offer an intriguing logical, statistical and linguistic puzzle. The following is an overview of my latest take on the Birnbaum argument. See also “Breaking Through the Breakthrough” posts: Dec. 6 and Dec 7, 2011.

Gandenberger also introduces something called the methodological likelihood principle. A related idea for a U-Phil is to ask: can one mount a sound, non-circular argument for that variant?  And while one is at it, do his methodological variants of sufficiency and conditionality yield plausible principles?

Graduate students and others invited!

______________________________________________________

New Summary of Mayo Critique of Birnbaum’s Argument for the SLP
Deborah Mayo
See also a (draft) of the full PAPER corresponding to this summary. Yet other links to the Strong Likelihood Principle SLP: Mayo 2010; Cox & Mayo 2011 (appendix).

Read more »

Categories: Likelihood Principle, Statistics, U-Phil | 19 Comments

Mayo Responds to U-Phils on Background Information

Thanks to Emrah Aktunc and Christian Hennig for their U-Phils on my September 12 post: “How should ‘prior information’ enter in statistical inference?” and my subsequent deconstruction of Gelman[i] (starting here, and ending with part 3).  I’ll begin with some remarks on Emrah Aktunc’s contribution.

First, we need to avoid an ambiguity that clouds prior information and prior probability. In a given experiment, prior information may be stronger than the data: to take but one example, say that we’ve already falsified Newton’s theory of gravity in several domains, but in our experiment the data (e.g., one of the sets of eclipse data from 1919) accords with the Newtonian prediction (of half the amount of deflection as that predicted by Einstein’s general theory of relativity [GTR]). The pro-Newton data, in and of itself, would be rejected because of all that we already know. Read more »

Categories: Background knowledge, Error Statistics, Philosophy of Statistics, Statistics, U-Phil | Tags: , , | 4 Comments

U-Phils: Hennig and Aktunc on Gelman 2012

I am posting two U-Phils I received in relation to the 9/12 call call on Andrew Gelman’s (2012): “Ethics and the statistical use of prior information”

A Deconstruction of Gelman by Mayo in 3 parts:
(10/5/12) Part 1: “A Bayesian wants everybody else to be a non-Bayesian”
(10/7/12) Part 2: Using prior Information
(10/9/12) Part 3: beauty and the background knowledge

Comments on “How should prior information enter in statistical inference”

Christian Hennig 
Department of Statistical Science
University College London

Reading the blog entries on this topic, the Cox-Mayo Conversation and the linked paper by Gelman, I appreciate the valuable thoughts in both, which to me all make sense, specifying situations where prior information is rather not desired to enter, or rather in the Bayesian way.

Thinking more about the issue, however, I find both the frequentist and the Bayesian approach seriously wanting in this respect (and I don’t have a better one myself either).

A difference between the approaches seems to be that Cox/Mayo rather look at the analysis of data in an isolated situation whereas Gelman rather writes about conclusions from not only analysing a particular data set, but from aggregating all the information available.

Cox/Mayo do not advocate to ignore prior knowledge, but they prefer to keep it out of the process of actually analysing the data. Mayo talks of a piecemeal approach in which results from different data analyses can be put together in order to get an overall picture. Read more »

Categories: Background knowledge, Error Statistics, Philosophy of Statistics, Statistics, Testing Assumptions, U-Phil | 11 Comments

Last part (3) of the deconstruction: beauty and background knowledge

Please see parts 1 and 2 and links therein. The background began in my Sept 12 post.

Gelman (2012) considers a case where the overall available evidence, E, is at odds with the indication of the results x from a given study:

Consider the notorious study in which a random sample of a few thousand people was analyzed, and it was found that the most beautiful parents were 8 percentage points more likely to have girls, compared to less attractive parents. The result was statistically significant (p<.05) and published in a reputable journal. But in this case we have good prior information suggesting that the difference in sex ratios in the population, comparing beautiful to less-beautiful parents, is less than 1 percentage point. A (non-Bayesian) design analysis reveals that, with this level of true difference, any statistically-significant observed difference in the sample is likely to be noise. At this point, you might well say that the original analysis should never have been done at all—but, given that it has been done, it is essential to use prior information (even if not in any formal Bayesian way) to interpret the data and generalize from sample to population.

Where did Fisher’s principle go wrong here? The answer is simple—and I think Cox would agree with me here. We’re in a setting where the prior information is much stronger than the data. (p. 3)

Let me simply grant Gelman that this prior information warrants (with severity) the hypothesis H:

H: “difference in sex ratios in the population, comparing beautiful to less-beautiful parents, is less than 1 percentage point,” (ibid.)

especially given my suspicions of the well-testedness of claims to show the effects of “beautiful to less-beautiful” on anything. I will simply take it as a given that it is well-tested background “knowledge.” Presumably, the well-tested claim goes beyond those individuals observed, and is generalizing at least to some degree. So we are given that the hypothesis H is one for which there is strong evidence. Read more »

Categories: Background knowledge, Error Statistics, Philosophy of Statistics, Statistics, U-Phil | 14 Comments

U-Phil (9/25/12) How should “prior information” enter in statistical inference?

Andrew Gelman, sent me an interesting note of his, “Ethics and the statistical use of prior information,”[i]. In section 3 he comments on some of David Cox’s remarks in a conversation we recorded:

 A Statistical Scientist Meets a Philosopher of Science: A Conversation between Sir David Cox and Deborah Mayo,published in Rationality, Markets and Morals [iii] (Section 2 has some remarks on L. Wasserman.)

This was a part of a highly informal, frank, and entirely unscripted conversation, with minimal editing from the tape-recording [ii]. It was first posted on this blog on Oct. 19, 2011. A related, earlier discussion on Gelman’s blog is here.

I want to open this for your informal comments ( “U-Phil”, ~750 words,by September 21 25)[iv]. (send to error@vt.edu)

Before I give my own “deconstruction” of Gelman on the relevant section, I will post a bit of background to the question of background. For starters, here’s the relevant portion of the conversation:

COX: Deborah, in some fields foundations do not seem very important, but we both think foundations of statistical inference are important; why do you think that is?

MAYO: I think because they ask about fundamental questions of evidence, inference, and probability. I don’t think that foundations of different fields are all alike; because in statistics we’re so intimately connected to the scientific interest in learning about the world, we invariably cross into philosophical questions about empirical knowledge and inductive inference.

COX: One aspect of it is that it forces us to say what it is that we really want to know when we analyze a situation statistically. Do we want to put in a lot of information external to the data, or as little as possible. It forces us to think about questions of that sort.

MAYO: But key questions, I think, are not so much a matter of putting in a lot or a little information. …What matters is the kind of information, and how to use it to learn. This gets to the question of how we manage to be so successful in learning about the world, despite knowledge gaps, uncertainties and errors. To me that’s one of the deepest questions and it’s the main one I care about. I don’t think a (deductive) Bayesian computation can adequately answer it. Read more »

Categories: Background knowledge, Philosophy of Statistics, U-Phil | Tags: , | 2 Comments

U-PHIL: Wasserman Replies to Spanos and Hennig

Wasserman on Spanos and Hennig on  “Low Assumptions, High Dimensions” (2011)

(originating U-PHIL : “Deconstructing Larry Wasserman” by Mayo )

________

Thanks to Aris and others for comments .

Response to Aris Spanos:

1. You don’t prefer methods based on weak assumptions? Really? I suspect Aris is trying to be provocative. Yes such inferences can be less precise. Good. Accuracy is an illusion if it comes from assumptions, not from data.

2. I do not think I was promoting inferences based on “asymptotic grounds.” If I did, that was not my intent. I want finite sample, distribution free methods. As an example, consider the usual finite sample (order statistics based) confidence interval for the median. No regularity assumptions, no asymptotics, no approximations. What is there to object to?

3. Indeed, I do have to make some assumptions. For simplicity, and because it is often reasonable, I assumed iid in the paper (as I will here). Other than that, where am I making any untestable assumptions in the example of the median?

4. I gave a very terse and incomplete summary of Davies’ work. I urge readers to look at Davies’ papers; my summary does not do the work justice. He certainly did not advocate eyeballing the data. Read more »

Categories: Philosophy of Statistics, Statistics, U-Phil | Tags: , , , , | 3 Comments

U-PHIL: Hennig and Gelman on Wasserman (2011)

Two further contributions in relation to

“Low Assumptions, High Dimensions” (2011)

Please also see : “Deconstructing Larry Wasserman” by Mayo, and Comments by Spanos

Christian Hennig:  Some comments on Larry Wasserman, “Low Assumptions, High Dimensions”

I enjoyed reading this stimulating paper. These are very important issues indeed. I’ll comment on both main concepts in the text.

1) Low Assumptions. I think that the term “assumption” is routinely misused and misunderstood in statistics. In Wasserman’s paper I can’t see such misuse explicitly, but I think that the “message” of the paper may be easily misunderstood because Wasserman doesn’t do much to stop people from this kind of misunderstanding.

Here is what I mean. The arithmetic mean can be derived as optimal estimator under an i.i.d. Gaussian model, which is often interpreted as “model assumption” behind it. However, we don’t really need the Gaussian distribution to be true for the mean to do a good job. Sometimes the mean will do a bad job in a non-Gaussian situation (for example in presence of gross outliers), but sometimes not. The median has nice robustness properties and is seen as admissible for ordinal data. It is therefore usually associated with “weaker assumptions”. However, the median may be worse than the mean in a situation where the Gaussian “assumption” of the mean is grossly violated. At UCL we ask students on a -2/-1/0/1/2 Likert scale for their general opinion about our courses. The distributions that we get here are strongly discrete and the scale is usually interpreted as of ordinal type. Still, for ranking courses, the median is fairly useless (pretty much all courses end up with a median of 0 or 1); whereas, the arithmetic mean can still detect statistically significant meaningful differences between courses.

Why? Because it’s not only the “official” model assumptions that matter but also whether a statistic uses all the data in an appropriate manner for the given application. Here it’s fatal that the median ignores all differences among observations north and south of it. Read more »

Categories: Philosophy of Statistics, Statistics, U-Phil | Tags: , , , , | 3 Comments

U-PHIL: Aris Spanos on Larry Wasserman

Our first outgrowth of “Deconstructing Larry Wasserman”. 

Aris Spanos – Comments on:

“Low Assumptions, High Dimensions” (2011)

by Larry Wasserman*

I’m happy to play devil’s advocate in commenting on Larry’s very interesting and provocative (in a good way) paper on ‘how recent developments in statistical modeling and inference have [a] changed the intended scope of data analysis, and [b] raised new foundational issues that rendered the ‘older’ foundational problems more or less irrelevant’.

The new intended scope, ‘low assumptions, high dimensions’, is delimited by three characteristics:

“1. The number of parameters is larger than the number of data points.

2. Data can be numbers, images, text, video, manifolds, geometric objects, etc.

3. The model is always wrong. We use models, and they lead to useful insights but the parameters in the model are not meaningful.” (p. 1)

In the discussion that follows I focus almost exclusively on the ‘low assumptions’ component of the new paradigm. The discussion by David F. Hendry (2011), “Empirical Economic Model Discovery and Theory Evaluation,” RMM, 2: 115-145,  is particularly relevant to some of the issues raised by the ‘high dimensions’ component in a way that complements the discussion that follows.

My immediate reaction to the demarcation based on 1-3 is that the new intended scope, although interesting in itself, excludes the overwhelming majority of scientific fields where restriction 3 seems unduly limiting. In my own field of economics the substantive information comes primarily in the form of substantively specified mechanisms (structural models), accompanied with theory-restricted and substantively meaningful parameters.

In addition, I consider the assertion “the model is always wrong” an unhelpful truism when ‘wrong’ is used in the sense that “the model is not an exact picture of the ‘reality’ it aims to capture”. Worse, if ‘wrong’ refers to ‘the data in question could not have been generated by the assumed model’, then any inference based on such a model will be dubious at best! Read more »

Categories: Philosophy of Statistics, Statistics, U-Phil | Tags: , , , , | 7 Comments

U-PHIL: Deconstructing Larry Wasserman

Deconstructing [i] Larry Wasserman

The temptation is strong, but I shall refrain from using the whole post to deconstruct Al Franken’s 2003 quip about media bias (from Lies and Lying Liars Who Tell Them: A Fair and Balanced Look at the Right), with which Larry Wasserman begins his paper “Low Assumptions, High Dimensions” (2011) in his contribution to Rationality, Markets and Morals (RMM) Special Topic: Statistical Science and Philosophy of Science:

Wasserman: There is a joke about media bias from the comedian Al Franken:
‘To make the argument that the media has a left- or right-wing, or a liberal or a conservative bias, is like asking if the problem with Al-Qaeda is: do they use too much oil in their hummus?’

According to Wasserman, “a similar comment could be applied to the usual debates in the foundations of statistical inference.”

Although it’s not altogether clear what Wasserman means by his analogy with comedian (now senator) Franken, it’s clear enough what Franken meant if we follow up the quip with the next sentence in his text (which Wasserman omits): “The problem with al Qaeda is that they’re trying to kill us!” (p. 1). The rest of Franken’s opening chapter is not about al Qaeda but about bias in media. Conservatives, he says, decry what they claim is a liberal bias in mainstream media. Franken rejects their claim.

The mainstream media does not have a liberal bias. And for all their other biases . . . , the mainstream media . . . at least try to be fair. …There is, however, a right-wing media. . . . They are biased. And they have an agenda…The members of the right-wing media are not interested in conveying the truth… . They are an indispensable component of the right-wing machine that has taken over our country… .   We have to be vigilant.  And we have to be more than vigilant.  We have to fight back… . Let’s call them what they are: liars. Lying, lying, liars. (Franken, pp. 3-4)

When I read this in 2004 (when Bush was in office), I couldn’t have agreed more. How things change*. Now, of course, any argument that swerves from the politically correct is by definition unsound, irrelevant, and/ or biased. [ii]

But what does this have to do with Bayesian-frequentist foundations? What is Wasserman, deep down, really trying to tell us by way of this analogy (if only subliminally)? Such are my ponderings—and thus this deconstruction.  (I will invite your “U-Phils” at the end.) I will allude to passages from my contribution to  RMM (2011)  http://www.rmm-journal.de/htdocs/st01.html  (in red).

A.What Is the Foundational Issue?

Wasserman: To me, the most pressing foundational question is: how do we reconcile the two most powerful needs in modern statistics: the need to make methods assumption free and the need to make methods work in high dimensions… . The Bayes-Frequentist debate is not irrelevant but it is not as central as it once was. (p. 201)

One may wonder why he calls this a foundational issue, as opposed to, say, a technical one. I will assume he means what he says and attempt to extract his meaning by looking through a foundational lens.

Let us examine the urgency of reconciling the need to make methods assumption-free and that of making them work in complex high dimensions. The problem of assumptions of course arises when they are made about unknowns that can introduce threats of error and/or misuse of methods. Read more »

Categories: Philosophy of Statistics, Statistics, U-Phil | Tags: , , , | 21 Comments

Clark Glymour: The Theory of Search Is the Economics of Discovery (part 2)

“Some Thoughts Prompted by David Hendry’s Essay * (RMM) Special Topic: Statistical Science and Philosophy of Science,” by  Professor Clark Glymour

Part 2 (of 2) (Please begin with part 1)

The first thing one wants to know about a search method is what it is searching for, what would count as getting it right. One might want to estimate a probability distribution, or get correct forecasts of some probabilistic function of the distribution (e.g., out-of-sample means), or a causal structure, or some probabilistic function of the distribution resulting from some class of interventions.  Secondly, one wants to know about what decision theorists call a loss function, but less precisely, what is the comparative importance of various errors of measurement, or, in other terms, what makes some approximations better than others. Third, one wants a limiting consistency proof: sufficient conditions for the search to reach the goal in the large sample limit. There are various kinds of consistency—pointwise versus uniform for example—and one wants to know which of those, if any, hold for a search method under what assumptions about the hypothesis space and the sampling distribution. Fourth, one wants to know as much as possible about the behavior of the search method on finite samples. In simple cases of statistical estimation there are analytic results; more often for search methods only simulation results are possible, but if so, one wants them to explore the bounds of failure, not just easy cases. And, of course, one wants a rationale for limiting the search space, as well as, some sense of how wrong the search can be if those limits are violated in various ways.

There are other important economic features of search procedures. Probability distributions (or likelihood functions) can instantiate any number of constraints—vanishing partial correlations for example, or inequalities of correlations. Suppose the hypothesis space delimits some big class of probability distributions. Suppose the search proceeds by testing constraints (the points that follow apply as well if the procedure computes posterior probabilities for particular hypotheses and applies a decision rule.) There is a natural partial ordering of classes of constraints: B is weaker than A if and only if every distribution that satisfies class A satisfies class B.  Other things equal, a weakest class might be preferred because it requires fewer tests.  But more important is what the test of a constraint does in efficiently guiding the search. A test that eliminates a particular hypothesis is not much help. A test that eliminates a big class of hypotheses is a lot of help.

Other factors: the power of the requisite tests; the numbers of tests (or posterior probability assessments) required; the computational requirements of individual tests (or posterior probability assessments.) And so on.  And, finally, search algorithms have varying degrees of generality. For example, there are general algorithms, such as the widely used PC search algorithm for graphical causal models, that are essentially search schema: stick in whatever decision procedure for conditional independence and PC becomes a search procedure using that conditional independence oracle. By contrast, some searches are so embedded in a particular hypothesis space that it is difficult to see the generality.

I am sure I am not qualified to comment on the details of Hendry’s search procedure, and even if I were, for reasons of space his presentation is too compressed for that. Still, I can make some general remarks.  I do not know from his essay the answers to many of the questions pertinent to evaluating a search procedure that I raised above. For example, his success criterion is “congruence” and I have no idea what that is. That is likely my fault, since I have read only one of his books, and that long ago.

David Hendry dismisses “priors,” meaning, I think, Bayesian methods, with an argument from language acquisition. Kids don’t need priors to learn a language. I am not sure of Hendry’s logic.  Particular grammars within a parametric “universal grammar” could in principle be learned by a Bayesian procedure, although I have no reason to think they are. But one way or the other, that has no import for whether Bayesian procedures are the most advantageous for various search problems by any of the criteria I have noted above. Sometimes they may be, sometimes not, there is no uniform answer, in part because computational requirements vary. I could give examples, but space forbids.

Abstractly, one could think there are two possible ways of searching when the set of relationships to be uncovered may form a complex web: start by positing all possible relationships and eliminate from there, or start by positing no relationships and build up.  Hendry dismisses the latter, with what generality I do not know. What I do know is that the relations between “bottom-up” and “top-down” or “forward” and “backward” search can be intricate, and in some cases one may need both for consistency.  Sometimes either will do. Graphical models, for example can be searched starting with the assumption that every variable influences every other and eliminating, or starting with the assumption that no variable influences any other and adding.  There are pointwise consistent searches in both directions. The real difference is in complexity.

Read more »

Categories: Philosophy of Statistics, Statistics, U-Phil | Tags: , , , , | 11 Comments

Blog at WordPress.com. Theme: Customized Adventure Journal by Contexture International.

Follow

Get every new post delivered to your Inbox.

Join 84 other followers