U-Phil: I would like to open up this post, together with Gandenberger’s (Oct. 30, 2012), to reader U-Phils, from December 6- 19 (< 1000 words) for posting on this blog (please see # at bottom of post). Where Gandenberger claims, “Birnbaum’s proof is valid and his premises are intuitively compelling,” I have shown that if Birnbaum’s premises are interpreted so as to be true, the argument is invalid. If construed as formally valid, I argue, the premises contradict each other. Who is right? Gandenberger doesn’t wrestle with my critique of Birnbaum, but I invite you (and Greg!) to do so. I’m pasting a new summary of my argument below.
The main premises may be found on pp. 11-14. While these points are fairly straightforward (and do not require technical statistics), they offer an intriguing logical, statistical and linguistic puzzle. The following is an overview of my latest take on the Birnbaum argument. See also “Breaking Through the Breakthrough” posts: Dec. 6 and Dec 7, 2011.
Gandenberger also introduces something called the methodological likelihood principle. A related idea for a U-Phil is to ask: can one mount a sound, non-circular argument for that variant? And while one is at it, do his methodological variants of sufficiency and conditionality yield plausible principles?
Graduate students and others invited!
New Summary of Mayo Critique of Birnbaum’s Argument for the SLP
See also a (draft) of the full PAPER corresponding to this summary. Yet other links to the Strong Likelihood Principle SLP: Mayo 2010; Cox & Mayo 2011 (appendix).
Evidential Meaning and Methods of Inference
PhD student, History and Philosophy of Science
Master’s student, Statistics
University of Pittsburgh
Bayesian methods conform to the Likelihood Principle, while frequentist methods do not. Thus, proofs of the Likelihood Principle* such as Birnbaum’s (1962) appear to be threats to frequentist positions. Deborah Mayo has recently argued that Birnbaum’s proof is no threat to frequentist positions because it is invalid (Ch. 7(III) in Mayo and Spanos 2010). In my view, Birnbaum’s proof is valid and his premises are intuitively compelling. Nevertheless, I agree with Professor Mayo that the proof, properly understood, does not imply that frequentist methods should not be used.
There are actually at least two different Likelihood Principles: one, which I call the Evidential Likelihood Principle, says that the evidential meaning of an experimental outcome with respect to a set of hypotheses depends only on its likelihood function for those hypothesis (i.e., the function that maps each of those hypotheses to the probability it assigns to that outcome, defined up to a constant of proportionality); the other, which I call the Methodological Likelihood Principle, says that a statistical method should not be used if it can generate different conclusions from outcomes that have the same likelihood function, without a relevant difference in utilities or prior probabilities. Continue reading
I escaped (to Virginia) from New York just in the nick of time before the threat of Hurricane Sandy led Bloomberg to completely shut things down (a whole day in advance!) in expectation of the looming “Frankenstorm”. Searching for the latest update on the extent of Sandy’s impacts, I noticed an interesting post on statblogs by Dr. Nic: “Which type of error do you prefer?”. She begins:
Mayor Bloomberg is avoiding a Type 2 error
As I write this, Hurricane Sandy is bearing down on the east coast of the United States. Mayor Bloomberg has ordered evacuations from various parts of New York City. All over the region people are stocking up on food and other essentials and waiting for Sandy to arrive. And if Sandy doesn’t turn out to be the worst storm ever, will people be relieved or disappointed? Either way there is a lot of money involved. And more importantly, risk of human injury and death. Will the forecasters be blamed for over-predicting?
Given that my son’s ability to travel back here is on-hold until planes fly again—not to mention that snow is beginning to swirl outside my window,—I definitely hope Bloomberg was erring on the side of caution. However, I think that type 1 and 2 errors should generally be put in terms of the extent and/or direction of errors that are or are not indicated or ruled out by test data. Criticisms of tests very often harp on the dichotomous type 1 and 2 errors, as if a user of tests does not have latitude to infer the extent of discrepancies that are/are not likely. At times, attacks on the “culture of dichotomy” reach fever pitch, and lead some to call for the overthrow of tests altogether (often in favor of confidence intervals), as well as to the creation of task forces seeking to reform if not “ban” statistical tests (which I spoof here). Continue reading
Reblogging 1 year ago in Oxford: Oxford Jail is an entirely fitting place to be on Halloween!
Moreover, rooting around this rather lavish set of jail cells (what used to be a single cell is now a dressing room) is every bit as conducive to philosophical reflection as is exile on Elba! My goal (while in this gaol—as the English sometimes spell it) is to try and free us from the bogeymen and bogeywomen often associated with “classical” statistics. As a start, the very term “classical statistics” should I think be shelved, not that names should matter.
In appraising statistical accounts at the foundational level, we need to realize the extent to which accounts are viewed through the eyeholes of a mask or philosophical theory. Moreover, the mask some wear while pursuing this task might well be at odds with their ordinary way of looking at evidence, inference, and learning. In any event, to avoid non-question-begging criticisms, the standpoint from which the appraisal is launched must itself be independently defended. But for Bayesian critics of error statistics the assumption that uncertain inference demands a posterior probability for claims inferred is thought to be so obvious as not to require support. Critics are implicitly making assumptions that are at odds with the frequentist statistical philosophy. In particular, they assume a certain philosophy about statistical inference (probabilism), often coupled with the allegation that error statistical methods can only achieve radical behavioristic goals, wherein all that matters are long-run error rates (of some sort) Continue reading
“Are you butter off now? Deconstructing the butter bust of the President” http://rejectedpostsofdmayo.com/
Ontology & Methodology 2013
4-5 May, 2013
Special Invited Speakers:
David Danks (CMU)
Peter Godfrey-Smith (CUNY)
Kevin Hoover (Duke)
Laura Ruetsche (U. Mich)
James Woodward (Pitt)
Virginia Tech Speakers:
Benjamin Jantzen, Deborah Mayo, Lydia Patton,
- How do scientists’ initial conjectures about the entities and processes under their scrutiny influence the choice of variables, the structure of mature scientific theories, and methods of interpretation of those theories?
- How do methods of data generation, statistical modeling, and analysis interact with the construction and appraisal of theories at multiple levels?
- How does historical analysis of theory development illuminate the interplay between scientific methodology, theory building, and theory interpretation ?
This conference brings together prominent philosophers of biology, cognitive science, economics, and physics engaged in research into these interconnected methodological and ontological questions.
We invite contributed papers that illuminate these issues as they arise in general philosophy of science, in causal explanation and modeling, in the philosophy of experiment and statistics, and in the history and philosophy of science. We anticipate covering accommodation costs for accepted contributed papers. website
Deadline for submissions: January 15, 2013
Organizers: Benjamin Jantzen, Deborah Mayo, Lydia Patton
Sponsors: The Virginia Tech Department of Philosophy and the Fund for Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science (E.R.R.O.R.)
Here is the final section (7) of my new paper: “Statistical Science Meets Philosophy of Science Part 2” SS & POS 2.* Section 6 is in my last post.
7. Can/Should Bayesian and Error Statistical Philosophies Be Reconciled?
Stephen Senn makes a rather startling but doubtlessly true remark:
The late and great George Barnard, through his promotion of the likelihood principle, probably did as much as any statistician in the second half of the last century to undermine the foundations of the then dominant Neyman-Pearson framework and hence prepare the way for the complete acceptance of Bayesian ideas that has been predicted will be achieved by the De Finetti-Lindley limit of 2020. (Senn 2008, 459)
Many do view Barnard as having that effect, even though he himself rejected the likelihood principle (LP). One can only imagine Savage’s shock at hearing that contemporary Bayesians (save true subjectivists) are lukewarm about the LP! The 2020 prediction could come to pass, only to find Bayesians practicing in bad faith. Kadane, one of the last of the true Savage Bayesians, is left to wonder at what can only be seen as a Pyrrhic victory for Bayesians.
Here is section 6 of my new paper: “Statistical Science Meets Philosophy of Science Part 2: Shallow versus Deep Explorations” SS & POS 2. Section 5 is in my last post.
6. Some Knock-Down Criticisms of Frequentist Error Statistics
With the error-statistical philosophy of inference under our belts, it is easy to run through the classic and allegedly damning criticisms of frequentist errorstatistical methods. Open up Bayesian textbooks and you will find, endlessly reprised, the handful of ‘counterexamples’ and ‘paradoxes’ that make up the charges leveled against frequentist statistics, after which the Bayesian account is proffered as coming to the rescue. There is nothing about how frequentists have responded to these charges; nor evidence that frequentist theory endorses the applications or interpretations around which these ‘chestnuts’ revolve.
If frequentist and Bayesian philosophies are to find common ground, this should stop. The value of a generous interpretation of rival views should cut both ways. A key purpose of the forum out of which this paper arises is to encourage reciprocity.
Here is section 5 of my new paper: “Statistical Science Meets Philosophy of Science Part 2: Shallow versus Deep Explorations” SS & POS 2. Sections 1 and 2 are in my last post.*
5. The Error-Statistical Philosophy
I recommend moving away, once and for all, from the idea that frequentists must ‘sign up’ for either Neyman and Pearson, or Fisherian paradigms. As a philosopher of statistics I am prepared to admit to supplying the tools with an interpretation and an associated philosophy of inference. I am not concerned to prove this is what any of the founders ‘really meant’.
Fisherian simple-significance tests, with their single null hypothesis and at most an idea of a directional alternative (and a corresponding notion of the ‘sensitivity’ of a test), are commonly distinguished from Neyman and Pearson tests, where the null and alternative exhaust the parameter space, and the corresponding notion of power is explicit. On the interpretation of tests that I am proposing, these are just two of the various types of testing contexts appropriate for different questions of interest. My use of a distinct term, ‘error statistics’, frees us from the bogeymen and bogeywomen often associated with ‘classical’ statistics, and it is to be hoped that that term is shelved. (Even ‘sampling theory’, technically correct, does not seem to represent the key point: the sampling distribution matters in order to evaluate error probabilities, and thereby assess corroboration or severity associated with claims of interest.) Nor do I see that my comments turn on whether one replaces frequencies with ‘propensities’ (whatever they are). Continue reading
Here are the first two sections of my new paper: “Statistical Science Meets Philosophy of Science Part 2: Shallow versus Deep Explorations” SS & POS 2. (Alternatively, go to the RMM page and scroll down to the Sept 26, 2012 entry.)
1. Comedy Hour at the Bayesian Retreat[i]
Overheard at the comedy hour at the Bayesian retreat: Did you hear the one about the frequentist…
“who defended the reliability of his radiation reading, despite using a broken radiometer, on the grounds that most of the time he uses one that works, so on average he’s pretty reliable?”
“who claimed that observing ‘heads’ on a biased coin that lands heads with probability .05 is evidence of a statistically significant improvement over the standard treatment of diabetes, on the grounds that such an event occurs with low probability (.05)?”
Such jests may work for an after-dinner laugh, but if it turns out that, despite being retreads of ‘straw-men’ fallacies, they form the basis of why some statisticians and philosophers reject frequentist methods, then they are not such a laughing matter. But surely the drubbing of frequentist methods could not be based on a collection of howlers, could it? I invite the reader to stay and find out. Continue reading
I was reviewing blog comments and various links people have sent me. I have noticed a kind of comment often arises about a type of (subjective?) Bayesian who does not assign probabilities to a general hypothesis H but only to observable events. In this way, it is claimed, one can avoid various criticisms but retain the Bayesian position, label it (A):
(A) the warrant accorded to an uncertain claim is in terms of probability assignments (to events).
But what happens when H’s predictions are repeatedly and impressively born out in a variety of experiments? Either one can say nothing about the warrant for H (having assumed A), or else one seeks a warrant for H other than a probability assignment to H*.
Take the former. In that case what good is it to have passed many of H’s predictions? We cannot say we have grounds to accept H in some non-probabilistic sense (since that’s been ruled out by (A)). We also cannot say that the impressive successes in the past warrant predicting that future successes are probable because events do not warrant other events. It is only through some general claim or statistical hypothesis that we may deduce predicted probabilities of events. Continue reading
A new article of mine, “Statistical Science and Philosophy of Science Part 2: Shallow versus Deep Explorations” has been published in the on-line journal, Rationality, Markets, and Morals (Special Topic: Statistical Science and Philosophy of Science: Where Do/Should They Meet?”).
The contributions to this special volume began with the conference we ran in June 2010. (See web poster.) My first article in this collection was essentially just my introduction to the volume, whereas this new one discusses my work. If you are a reader of this blog, you will recognize portions from early posts, as I’d been revising it then.
The sections are listed below. I will be posting portions in the next few days. We invite comments for this blog, and for possible publication in this special volume of RMM, if received before the end of this year.
This is the 8th RMM announcement. Many thanks to Sailor for digging up the previous 7, and listing them at the end*. (The paper’s title stemmed from the Deepwater Horizon oil spill of spring 2010**).
Inability to clearly defend against the criticisms of frequentist methods has turned many a frequentist away from venturing into foundational battlegrounds. Conceding the distorted perspectives drawn from overly literal and radical expositions of what Fisher, Neyman, and Pearson ‘really thought’, some deny they matter to current practice. The goal of this paper is not merely to call attention to the howlers that pass as legitimate criticisms of frequentist error statistics, but also to sketch the main lines of an alternative statistical philosophy within which to better articulate the roles and value of frequentist tools.
Statistical Science Meets Philosophy of Science Part 2:
Shallow versus Deep Explorations
1. Comedy Hour at the Bayesian Retreat
2. Popperians Are to Frequentists as Carnapians Are to Bayesians
2.1 Severe Tests
2.2 Another Egregious Violation of the Severity Requirement
2.3 The Rationale for Severity is to Find Things Out Reliably
2.4 What Can Be Learned from Popper; What Can Popperians Be Taught?
3. Frequentist Error-Statistical Tests
3.1 Probability in Statistical Models of Experiments
3.2 Statistical Test Ingredients
3.3. Hypotheses and Events
3.4. Hypotheses Inferred Need Not Be Predesignated Continue reading
Thanks to Emrah Aktunc and Christian Hennig for their U-Phils on my September 12 post: “How should ‘prior information’ enter in statistical inference?” and my subsequent deconstruction of Gelman[i] (starting here, and ending with part 3). I’ll begin with some remarks on Emrah Aktunc’s contribution.
First, we need to avoid an ambiguity that clouds prior information and prior probability. In a given experiment, prior information may be stronger than the data: to take but one example, say that we’ve already falsified Newton’s theory of gravity in several domains, but in our experiment the data (e.g., one of the sets of eclipse data from 1919) accords with the Newtonian prediction (of half the amount of deflection as that predicted by Einstein’s general theory of relativity [GTR]). The pro-Newton data, in and of itself, would be rejected because of all that we already know. Continue reading
I am posting two U-Phils I received in relation to the 9/12 call call on Andrew Gelman’s (2012): “Ethics and the statistical use of prior information”
A Deconstruction of Gelman by Mayo in 3 parts:
(10/5/12) Part 1: “A Bayesian wants everybody else to be a non-Bayesian”
(10/7/12) Part 2: Using prior Information
(10/9/12) Part 3: beauty and the background knowledge
Comments on “How should prior information enter in statistical inference”
Department of Statistical Science
University College London
Reading the blog entries on this topic, the Cox-Mayo Conversation and the linked paper by Gelman, I appreciate the valuable thoughts in both, which to me all make sense, specifying situations where prior information is rather not desired to enter, or rather in the Bayesian way.
Thinking more about the issue, however, I find both the frequentist and the Bayesian approach seriously wanting in this respect (and I don’t have a better one myself either).
A difference between the approaches seems to be that Cox/Mayo rather look at the analysis of data in an isolated situation whereas Gelman rather writes about conclusions from not only analysing a particular data set, but from aggregating all the information available.
Cox/Mayo do not advocate to ignore prior knowledge, but they prefer to keep it out of the process of actually analysing the data. Mayo talks of a piecemeal approach in which results from different data analyses can be put together in order to get an overall picture. Continue reading
Please see parts 1 and 2 and links therein. The background began in my Sept 12 post.
Gelman (2012) considers a case where the overall available evidence, E, is at odds with the indication of the results x from a given study:
Consider the notorious study in which a random sample of a few thousand people was analyzed, and it was found that the most beautiful parents were 8 percentage points more likely to have girls, compared to less attractive parents. The result was statistically significant (p<.05) and published in a reputable journal. But in this case we have good prior information suggesting that the difference in sex ratios in the population, comparing beautiful to less-beautiful parents, is less than 1 percentage point. A (non-Bayesian) design analysis reveals that, with this level of true difference, any statistically-significant observed difference in the sample is likely to be noise. At this point, you might well say that the original analysis should never have been done at all—but, given that it has been done, it is essential to use prior information (even if not in any formal Bayesian way) to interpret the data and generalize from sample to population.
Where did Fisher’s principle go wrong here? The answer is simple—and I think Cox would agree with me here. We’re in a setting where the prior information is much stronger than the data. (p. 3)
Let me simply grant Gelman that this prior information warrants (with severity) the hypothesis H:
H: “difference in sex ratios in the population, comparing beautiful to less-beautiful parents, is less than 1 percentage point,” (ibid.)
especially given my suspicions of the well-testedness of claims to show the effects of “beautiful to less-beautiful” on anything. I will simply take it as a given that it is well-tested background “knowledge.” Presumably, the well-tested claim goes beyond those individuals observed, and is generalizing at least to some degree. So we are given that the hypothesis H is one for which there is strong evidence. Continue reading
(Please see part 1 for links and references):
A Bayesian, Gelman tells us, “wants everybody else to be a non-Bayesian” (p. 5). Despite appearances, the claim need not be seen as self-contradictory, at least if we interpret it most generously, as Rule #2 (of this blog) directs. Whether or not “a Bayesian” refers to all Bayesians or only non-standard Bayesians (i.e., those wearing a hat of which Gelman approves), his meaning might be simply that when setting out with his own inquiry, he doesn’t want your favorite priors (be that beliefs or formally derived constructs) getting in the way. A Bayesian, says Gelman (in this article) is going to make inferences based on “trying to extract information from the data” in order to determine what to infer or believe (substitute your preferred form of output) about some aspect of a population (or mechanism) generating the data, as modeled. He just doesn’t want the “information from the data” muddied by your particular background knowledge. He would only have to subtract out all of this “funny business” to get at your likelihoods. He would only have to “divide away” your prior distributions before getting to his own analysis (p. 5). As in Gelman’s trial analogy (p. 5.), he prefers to combine your “raw data,” and your likelihoods, with his own well-considered background information. We can leave open whether he will compute posteriors (at least in the manner he recommends here) or not (as suggested in other work). So perhaps we have arrived at a sensible deconstruction of Gelman, free of contradiction. Whether or not this leaves texts open to some charge of disingenuity, I leave entirely to one side.
Now at this point I wonder: do Bayesian reports provide the ingredients for such “dividing away”? I take it that they’d report the priors, which could be subtracted out, but how is the rest of the background knowledge communicated and used? It would seem to include assorted background knowledge of instruments, of claims that had been sufficiently well corroborated to count as knowledge, of information about that which was not previously well tested, of flaws and biases and threats of error to take into account in future designs, etc. (as in our ESP examples 9/22 and 9/25). The evidence for any background assumptions should also be made explicit and communicated (unless it consists of trivial common knowledge). Continue reading
I was to have philosophically deconstructed a few paragraphs from (the last couple of sections) in a column Andrew Gelman sent me on “Ethics and the statistical use of prior information”[i]. The discussion begins with my Sept 12 post, and follows through several posts over the second half of September (see [ii]), all by way of background. But I got called away before finishing the promised deconstruction, and it was only this evening that I tried to wade through a circuitous swamp of remarks. I will just post the first part (of 2 or perhaps 3?), which is already too long.
Since I have a tendency to read articles from back to front, on a first read at least, let me begin with his last section titled: “A Bayesian wants everybody else to be a non-Bayesian.” Surely that calls for philosophical deconstruction, if anything does. It seems at the very least an exceptional view. Whether it’s widely held I can’t say (please advise). But suppose it’s true: Bayesians are publicly calling on everybody to use Bayesian methods, even though, deep down, they really, really hope everybody else won’t blend everything together before they can use the valid parts from the data—and they really, really hope that everybody else will provide the full panoply of information about what happened in other experiments, and what background theories are well corroborated, and about the precision of the instruments relied upon, and about other experiments that appear to conflict with the current one and with each other, etc., etc. Suppose that Bayesians actually would prefer, and are relieved to find, that, despite their exhortations, “everybody else” doesn’t report their posterior probabilities (whichever version of Bayesianism they are using) because then they can introduce their own background and figure out what is and is not warranted (in whatever sense seems appropriate). Continue reading