See Rejected Posts.
Statistics
Rejected Post: Clinical Trial Statistics Doomed by Mayan Apocalypse?
PhilStat/Law/Stock: more on “bad statistics”: Schachtman
Nathan Schachtman has an update on the case of U.S. v. Harkonen discussed in my last 3 posts: here, here, and here.
United States of America v. W. Scott Harkonen, MD — Part III
Background
The recent oral argument in United States v. Harkonen (see “The (Clinical) Trial by Franz Kafka” (Dec. 11, 2012)), pushed me to revisit the brief filed by the Solicitor General’s office in Matrixx Initiatives Inc. v. Siracusano, 131 S. Ct. 1309 (2011). One of Dr. Harkonen’s post-trial motions contended that the government’s failure to disclose its Matrixx amicus brief deprived him of a powerful argument that would have resulted from citing the language of the brief, which disparaged the necessity of statistical significance for “demonstrating” causal inferences. See “Multiplicity versus Duplicity – The Harkonen Conviction” (Dec. 11, 2012). Continue reading
PhilStat/Law/Stock: multiplicity and duplicity
So what’s the allegation that the prosecutors are being duplicitous about statistical evidence in the case discussed in my two previous (‘Bad Statistics’) posts? As a non-lawyer, I will ponder only the evidential (and not the criminal) issues involved.
“After the conviction, Dr. Harkonen’s counsel moved for a new trial on grounds of newly discovered evidence. Dr. Harkonen’s counsel hoisted the prosecutors with their own petards, by quoting the government’s amicus brief to the United States Supreme Court in Matrixx Initiatives Inc. v. Siracusano, 131 S. Ct. 1309 (2011). In Matrixx, the securities fraud plaintiffs contended that they need not plead ‘statistically significant’ evidence for adverse drug effects.” (Schachtman’s part 2, ‘The Duplicity Problem – The Matrixx Motion’)
The Matrixx case is another philstat/law/stock example taken up in this blog here, here, and here. Why are the Harkonen prosecutors “hoisted with their own petards” (a great expression, by the way)? Continue reading
“Bad statistics”: crime or free speech?
Hunting for “nominally” significant differences, trying different subgroups and multiple endpoints, can result in a much higher probability of erroneously inferring evidence of a risk or benefit than the nominal p-value, even in randomized controlled trials. This was an issue that arose in looking at RCTs in development economics (an area introduced to me by Nancy Cartwright), as at our symposium at the Philosophy of Science Association last month[i][ii]. Reporting the results of hunting and dredging in just the same way as if the relevant claims were predesignated can lead to misleading reports of actual significance levels.[iii]
Still, even if reporting spurious statistical results is considered “bad statistics,” is it criminal behavior? I noticed this issue in Nathan Schachtman’s blog over the past couple of days. The case concerns a biotech company, InterMune, and its previous CEO, Dr. Harkonen. Here’s an excerpt from Schachtman’s discussion (part 1). Continue reading
Don’t Birnbaumize that experiment my friend*–updated reblog
Our current topic, the strong likelihood principle (SLP), was recently mentioned by blogger Christian Robert (nice diagram). So ,since it’s Saturday night, and given the new law just passed in the state of Washington*, I’m going to reblog a post from Jan. 8, 2012, along with a new UPDATE (following a video we include as an experiment). The new material will be in red (slight differences in notation are explicated within links).
(A) “It is not uncommon to see statistics texts argue that in frequentist theory one is faced with the following dilemma: either to deny the appropriateness of conditioning on the precision of the tool chosen by the toss of a coin[i], or else to embrace the strong likelihood principle which entails that frequentist sampling distributions are irrelevant to inference once the data are obtained. This is a false dilemma … The ‘dilemma’ argument is therefore an illusion”. (Cox and Mayo 2010, p. 298)

The “illusion” stems from the sleight of hand I have been explaining in the Birnbaum argument—it starts with Birnbaumization. Continue reading
Mayo Commentary on Gelman & Robert
The following is my commentary on a paper by Gelman and Robert, forthcoming (in early 2013) in the The American Statistician* (submitted October 3, 2012).
_______________________
Discussion of Gelman and Robert, “Not only defended but also applied”: The perceived absurdity of Bayesian inference”
Deborah G. Mayo
1. Introduction
I am grateful for the chance to comment on the paper by Gelman and Robert. I welcome seeing statisticians raise philosophical issues about statistical methods, and I entirely agree that methods not only should be applicable but also capable of being defended at a foundational level. “It is doubtful that even the most rabid anti-Bayesian of 2010 would claim that Bayesian inference cannot apply” (Gelman and Robert 2012, p. 6). This is clearly correct; in fact, it is not far off the mark to say that the majority of statistical applications nowadays are placed under the Bayesian umbrella, even though the goals and interpretations found there are extremely varied. There are a plethora of international societies, journals, post-docs, and prizes with “Bayesian” in their name, and a wealth of impressive new Bayesian textbooks and software is available. Even before the latest technical advances and the rise of “objective” Bayesian methods, leading statisticians were calling for eclecticism (e.g., Cox 1978), and most will claim to use a smattering of Bayesian and non-Bayesian methods, as appropriate. George Casella (to whom their paper is dedicated) and Roger Berger in their superb textbook (2002) exemplify a balanced approach. Continue reading
Statistical Science meets Philosophy of Science
Many of the discussions on this blog have revolved around a cluster of issues under the general question: “Statistical Science and Philosophy of Science: Where Do (Should) They meet? (in the contemporary landscape)?” In tackling these issues, this blog regularly returns to a set of contributions growing out of a conference with the same title (June 2010, London School of Economics, Center for the Philosophy of Natural and Social Science, CPNSS), as well as to conversations initiated soon after. The conference site is here. My most recent reflections in this arena (Sept. 26, 2012) are here. Continue reading
Error Statistics (brief overview)
![]() |
In view of some questions about “behavioristic” vs “evidential” construals of frequentist statistics (from the last post), and how the error statistical philosophy tries to improve on Birnbaum’s attempt at providing the latter, I’m reblogging a portion of a post from Nov. 5, 2011 when I also happened to be in London. (The beginning just records a goofy mishap with a skeletal key, and so I leave it out in this reblog.) Two papers with much more detail are linked at the end.
Error Statistics
(1) There is a “statistical philosophy” and a philosophy of science. (a) An error-statistical philosophy alludes to the methodological principles and foundations associated with frequentist error-statistical methods. (b) An error-statistical philosophy of science, on the other hand, involves using the error-statistical methods, formally or informally, to deal with problems of philosophy of science: to model scientific inference (actual or rational), to scrutinize principles of inference, and to address philosophical problems about evidence and inference (the problem of induction, underdetermination, warranting evidence, theory testing, etc.). Continue reading
Blogging Birnbaum: on Statistical Methods in Scientific Inference
I said I’d make some comments on Birnbaum’s letter (to Nature), (linked in my last post), which is relevant to today’s Seminar session (at the LSE*), as well as to (Normal Deviate‘s) recent discussion of frequentist inference–in terms of constructing procedures with good long-run “coverage”. (Also to the current U-Phil).
NATURE VOL. 225 MARCH 14, 1970 (1033)
LETTERS TO THE EDITOR
Statistical methods in Scientific Inference
It is regrettable that Edwards’s interesting article[1], supporting the likelihood and prior likelihood concepts, did not point out the specific criticisms of likelihood (and Bayesian) concepts that seem to dissuade most theoretical and applied statisticians from adopting them. As one whom Edwards particularly credits with having ‘analysed in depth…some attractive properties” of the likelihood concept, I must point out that I am not now among the ‘modern exponents” of the likelihood concept. Further, after suggesting that the notion of prior likelihood was plausible as an extension or analogue of the usual likelihood concept (ref.2, p. 200)[2], I have pursued the matter through further consideration and rejection of both the likelihood concept and various proposed formalizations of prior information and opinion (including prior likelihood). I regret not having expressed my developing views in any formal publication between 1962 and late 1969 (just after ref. 1 appeared). My present views have now, however, been published in an expository but critical article (ref. 3, see also ref. 4)[3] [4], and so my comments here will be restricted to several specific points that Edwards raised. Continue reading
Announcement: 28 November: My Seminar at the LSE (Contemporary PhilStat)
Background reading: PAPER
See general announcement here.
Background to the Discussion: Question: How did I get involved in disproving Birnbaum’s result in 2006?
Answer: Appealing to something called the “weak conditionality principle (WCP)” arose in avoiding a classic problem (arising from mixture tests) described by David Cox (1958), as discussed in our joint paper:
Cox D. R. and Mayo. D. (2010). “Objectivity and Conditionality in Frequentist Inference” in Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science (D Mayo & A. Spanos eds.), CUP 276-304. Continue reading
Irony and Bad Faith: Deconstructing Bayesians-reblog
The recent post by Normal Deviate, and my comments on it, remind me of why/how I got back into the Bayesian-frequentist debates in 2006, as described in my first “deconstruction” (and “U-Phil”) on this blog (Dec 11, 2012):
Some time in 2006 (shortly after my ERROR06 conference), the trickle of irony and sometime flood of family feuds issuing from Bayesian forums drew me back into the Bayesian-frequentist debates.1 2 Suddenly sparks were flying, mostly kept shrouded within Bayesian walls, but nothing can long be kept secret even there. Spontaneous combustion is looming. The true-blue subjectivists were accusing the increasingly popular “objective” and “reference” Bayesians of practicing in bad faith; the new O-Bayesians (and frequentist-Bayesian unificationists) were taking pains to show they were not subjective; and some were calling the new Bayesian kids on the block “pseudo Bayesian.” Then there were the Bayesians somewhere in the middle (or perhaps out in left field) who, though they still use the Bayesian umbrella, were flatly denying the very idea that Bayesian updating fits anything they actually do in statistics.3 Obeisance to Bayesian reasoning remained, but on some kind of a priori philosophical grounds. Doesn’t the methodology used in practice really need a philosophy of its own? I say it does, and I want to provide this. Continue reading
Comments on Wasserman’s “what is Bayesian/frequentist inference?”
What I like best about Wasserman’s blogpost (Normal Deviate) is his clear denial that merely using conditional probability makes the method Bayesian (even if one chooses to call the conditional probability theorem Bayes’s theorem, and even if one is using ‘Bayes’s’ nets). Else any use of probability theory is Bayesian, which trivializes the whole issue. Thus, the fact that conditional probability is used in an application with possibly good results is not evidence of (yet another) Bayesian success story [i].
But I do have serious concerns that in his understandable desire (1) to be even-handed (hammers and screwdrivers are for different purposes, both perfectly kosher tools), as well as (2) to give a succinct sum-up of methods,Wasserman may encourage misrepresenting positions. Speaking only for “frequentist” sampling theorists [ii], I would urge moving away from the recommended quick sum-up of “the goal” of frequentist inference: “Construct procedures with frequency guarantees”. If by this Wasserman means that the direct aim is to have tools with “good long run properties”, that rarely err in some long run series of applications, then I think it is misleading. In the context of scientific inference or learning, such a long-run goal, while necessary is not at all sufficient; moreover, I claim, that satisfying this goal is actually just a byproduct of deeper inferential goals (controlling and evaluating how severely given methods are capable of revealing/avoiding erroneous statistical interpretations of data in the case at hand.) (So I deny that it is even the main goal to which frequentist methods direct themselves.) Even arch behaviorist Neyman used power post-data to ascertain how well corroborated various hypotheses were—never mind long-run repeated applications (see one of my Neyman’s Nursery posts). Continue reading
Seminars at the London School of Economics: Contemporary Problems in Philosophy of Statistics
As a visitor of the Centre for Philosophy of Natural and Social Science (CPNSS) at the London School of Economics and Political Science, I am leading 3 seminars in the department of Philosophy, Logic, and Scientific Method on Wednesdays from Nov. 28-Dec 12 on Contemporary Philosophy of Statistics under the PH500 rubric, Room: Lak 2.06 (Lakatos building). Interested individuals who have not yet contacted me, write: error@vt.edu .*
- 28 November: (10 – 12 noon): Mayo: On Birnbaum’s argument for the Likelihood Principle: A 50-year old error and its influence on statistical foundations (See my blog and links within.)
5 December and 12 December: Statistical Science meets philosophy of science: Mayo and guests:
- 5 Dec: 12 (noon)- 2p.m.: Sir David Cox
- 12 Dec (10-12).Dr. Stephen Senn;
Dr. Christian Hennig: TBA
Topics, activities, readings :TBA (Two 2012 Summer Seminars may be found here).
Blurb: Debates over the philosophical foundations of statistical science have a long and fascinating history marked by deep and passionate controversies that intertwine with fundamental notions of the nature of statistical inference and the role of probabilistic concepts in inductive learning. Progress in resolving decades-old controversies which still shake the foundations of statistics, demands both philosophical and technical acumen, but gaining entry into the current state of play requires a roadmap that zeroes in on core themes and current standpoints. While the seminar will attempt to minimize technical details, it will be important to clarify key notions to fully contribute to the debates. Relevance for general philosophical problems will be emphasized. Because the contexts in which statistical methods are most needed are ones that compel us to be most aware of strategies scientists use to cope with threats to reliability, considering the nature of statistical method in the collection, modeling, and analysis of data is an effective way to articulate and warrant general principles of evidence and inference.
Room 2.06 Lakatos Building; Centre for Philosophy of Natural and Social Science
London School of Economics
Houghton Street
London WC2A 2AE
Administrator: T. R. Chivers@lse.ac.uk
For updates, details, and associated readings: please check the LSE Ph500 page on my blog or write to me.
*It is not necessary to have attended the 2 sessions held during the summer of 2012.
U-Phil: Blogging the Likelihood Principle: New Summary
U-Phil: I would like to open up this post, together with Gandenberger’s (Oct. 30, 2012), to reader U-Phils, from December 6- 19 (< 1000 words) for posting on this blog (please see # at bottom of post). Where Gandenberger claims, “Birnbaum’s proof is valid and his premises are intuitively compelling,” I have shown that if Birnbaum’s premises are interpreted so as to be true, the argument is invalid. If construed as formally valid, I argue, the premises contradict each other. Who is right? Gandenberger doesn’t wrestle with my critique of Birnbaum, but I invite you (and Greg!) to do so. I’m pasting a new summary of my argument below.
The main premises may be found on pp. 11-14. While these points are fairly straightforward (and do not require technical statistics), they offer an intriguing logical, statistical and linguistic puzzle. The following is an overview of my latest take on the Birnbaum argument. See also “Breaking Through the Breakthrough” posts: Dec. 6 and Dec 7, 2011.
Gandenberger also introduces something called the methodological likelihood principle. A related idea for a U-Phil is to ask: can one mount a sound, non-circular argument for that variant? And while one is at it, do his methodological variants of sufficiency and conditionality yield plausible principles?
Graduate students and others invited!
______________________________________________________
New Summary of Mayo Critique of Birnbaum’s Argument for the SLP
Deborah Mayo
See also a (draft) of the full PAPER corresponding to this summary, a later and more satisfactory draft is here. Yet other links to the Strong Likelihood Principle SLP: Mayo 2010; Cox & Mayo 2011 (appendix).
Type 1 and 2 errors: Frankenstorm
I escaped (to Virginia) from New York just in the nick of time before the threat of Hurricane Sandy led Bloomberg to completely shut things down (a whole day in advance!) in expectation of the looming “Frankenstorm”. Searching for the latest update on the extent of Sandy’s impacts, I noticed an interesting post on statblogs by Dr. Nic: “Which type of error do you prefer?”. She begins:
Mayor Bloomberg is avoiding a Type 2 error
As I write this, Hurricane Sandy is bearing down on the east coast of the United States. Mayor Bloomberg has ordered evacuations from various parts of New York City. All over the region people are stocking up on food and other essentials and waiting for Sandy to arrive. And if Sandy doesn’t turn out to be the worst storm ever, will people be relieved or disappointed? Either way there is a lot of money involved. And more importantly, risk of human injury and death. Will the forecasters be blamed for over-predicting?
Given that my son’s ability to travel back here is on-hold until planes fly again—not to mention that snow is beginning to swirl outside my window,—I definitely hope Bloomberg was erring on the side of caution. However, I think that type 1 and 2 errors should generally be put in terms of the extent and/or direction of errors that are or are not indicated or ruled out by test data. Criticisms of tests very often harp on the dichotomous type 1 and 2 errors, as if a user of tests does not have latitude to infer the extent of discrepancies that are/are not likely. At times, attacks on the “culture of dichotomy” reach fever pitch, and lead some to call for the overthrow of tests altogether (often in favor of confidence intervals), as well as to the creation of task forces seeking to reform if not “ban” statistical tests (which I spoof here).
Continue reading
Mayo: (section 7) “StatSci and PhilSci: part 2″
Here is the final section (7) of my paper: “Statistical Science Meets Philosophy of Science Part 2” SS & POS 2.* Section 6 is in my last post.
7. Can/Should Bayesian and Error Statistical Philosophies Be Reconciled?
Stephen Senn makes a rather startling but doubtlessly true remark:
The late and great George Barnard, through his promotion of the likelihood principle, probably did as much as any statistician in the second half of the last century to undermine the foundations of the then dominant Neyman-Pearson framework and hence prepare the way for the complete acceptance of Bayesian ideas that has been predicted will be achieved by the De Finetti-Lindley limit of 2020. (Senn 2008, 459)
Many do view Barnard as having that effect, even though he himself rejected the likelihood principle (LP). One can only imagine Savage’s shock at hearing that contemporary Bayesians (save true subjectivists) are lukewarm about the LP! The 2020 prediction could come to pass, only to find Bayesians practicing in bad faith. Kadane, one of the last of the true Savage Bayesians, is left to wonder at what can only be seen as a Pyrrhic victory for Bayesians.
Query
I was reviewing blog comments and various links people have sent me. I have noticed a kind of comment often arises about a type of (subjective?) Bayesian who does not assign probabilities to a general hypothesis H but only to observable events. In this way, it is claimed, one can avoid various criticisms but retain the Bayesian position, label it (A):
(A) the warrant accorded to an uncertain claim is in terms of probability assignments (to events).
But what happens when H’s predictions are repeatedly and impressively born out in a variety of experiments? Either one can say nothing about the warrant for H (having assumed A), or else one seeks a warrant for H other than a probability assignment to H*.
Take the former. In that case what good is it to have passed many of H’s predictions? We cannot say we have grounds to accept H in some non-probabilistic sense (since that’s been ruled out by (A)). We also cannot say that the impressive successes in the past warrant predicting that future successes are probable because events do not warrant other events. It is only through some general claim or statistical hypothesis that we may deduce predicted probabilities of events. Continue reading
RMM-8: New Mayo paper: “StatSci and PhilSci: part 2 (Shallow vs Deep Explorations)”
A new article of mine, “Statistical Science and Philosophy of Science Part 2: Shallow versus Deep Explorations” has been published in the on-line journal, Rationality, Markets, and Morals (Special Topic: Statistical Science and Philosophy of Science: Where Do/Should They Meet?”).
The contributions to this special volume began with the conference we ran in June 2010. (See web poster.) My first article in this collection was essentially just my introduction to the volume, whereas this new one discusses my work. If you are a reader of this blog, you will recognize portions from early posts, as I’d been revising it then.
The sections are listed below. I will be posting portions in the next few days. We invite comments for this blog, and for possible publication in this special volume of RMM, if received before the end of this year.
This is the 8th RMM announcement. Many thanks to Sailor for digging up the previous 7, and listing them at the end*. (The paper’s title stemmed from the Deepwater Horizon oil spill of spring 2010**).
Abstract:
Inability to clearly defend against the criticisms of frequentist methods has turned many a frequentist away from venturing into foundational battlegrounds. Conceding the distorted perspectives drawn from overly literal and radical expositions of what Fisher, Neyman, and Pearson ‘really thought’, some deny they matter to current practice. The goal of this paper is not merely to call attention to the howlers that pass as legitimate criticisms of frequentist error statistics, but also to sketch the main lines of an alternative statistical philosophy within which to better articulate the roles and value of frequentist tools.
Statistical Science Meets Philosophy of Science Part 2:
Shallow versus Deep Explorations
1. Comedy Hour at the Bayesian Retreat
2. Popperians Are to Frequentists as Carnapians Are to Bayesians
2.1 Severe Tests
2.2 Another Egregious Violation of the Severity Requirement
2.3 The Rationale for Severity is to Find Things Out Reliably
2.4 What Can Be Learned from Popper; What Can Popperians Be Taught?
3. Frequentist Error-Statistical Tests
3.1 Probability in Statistical Models of Experiments
3.2 Statistical Test Ingredients
3.3. Hypotheses and Events
3.4. Hypotheses Inferred Need Not Be Predesignated Continue reading
Mayo Responds to U-Phils on Background Information
Thanks to Emrah Aktunc and Christian Hennig for their U-Phils on my September 12 post: “How should ‘prior information’ enter in statistical inference?” and my subsequent deconstruction of Gelman[i] (starting here, and ending with part 3). I’ll begin with some remarks on Emrah Aktunc’s contribution.
First, we need to avoid an ambiguity that clouds prior information and prior probability. In a given experiment, prior information may be stronger than the data: to take but one example, say that we’ve already falsified Newton’s theory of gravity in several domains, but in our experiment the data (e.g., one of the sets of eclipse data from 1919) accords with the Newtonian prediction (of half the amount of deflection as that predicted by Einstein’s general theory of relativity [GTR]). The pro-Newton data, in and of itself, would be rejected because of all that we already know. Continue reading




