Monthly Archives: December 2012

Midnight With Birnbaum-reblog

 Reblogging Dec. 31, 2011:

You know how in that recent movie, “Midnight in Paris,” the main character (I forget who plays it, I saw it on a plane) is a writer finishing a novel, and he steps into a cab that mysteriously picks him up at midnight and transports him back in time where he gets to run his work by such famous authors as Hemingway and Virginia Wolf?  He is impressed when his work earns their approval and he comes back each night in the same mysterious cab…Well, imagine an error statistical philosopher is picked up in a mysterious taxi at midnight (New Year’s Eve 2011 2012) and is taken back fifty years and, lo and behold, finds herself in the company of Allan Birnbaum.[i]

ERROR STATISTICIAN: It’s wonderful to meet you Professor Birnbaum; I’ve always been extremely impressed with the important impact your work has had on philosophical foundations of statistics.  I happen to be writing on your famous argument about the likelihood principle (LP).  (whispers: I can’t believe this!)

BIRNBAUM: Ultimately you know I rejected the LP as failing to control the error probabilities needed for my Confidence concept.

ERROR STATISTICIAN: Yes, but I actually don’t think your argument shows that the LP follows from such frequentist concepts as sufficiency S and the weak conditionality principle WLP.[ii]  Sorry,…I know it’s famous… Continue reading

Categories: Birnbaum Brakes, strong likelihood principle | Tags: , , ,

An established probability theory for hair comparison? “is not — and never was”

Forensic Hair red

Hypothesis H: “person S is the source of this hair sample,” if indicated by a DNA match, has passed a more severe test than if it were indicated merely by a visual analysis under a microscopic. There is a much smaller probability of an erroneous hair match using DNA testing than using the method of visual analysis used for decades by the FBI.

The Washington Post reported on its latest investigation into flawed statistics behind hair match testimony. “Thousands of criminal cases at the state and local level may have relied on exaggerated testimony or false forensic evidence to convict defendants of murder, rape and other felonies”. Below is an excerpt of the Post article by Spencer S. Hsu.

I asked John Byrd, forensic anthropologist and follower of this blog, what he thought. It turns out that “hair comparisons do not have a well-supported weight of evidence calculation.” (Byrd).  I put Byrd’s note at the end of this post. Continue reading

Categories: Severity, Statistics

3 msc kvetches on the blog bagel circuit

Mayo elbow

In the past week, I’ve kvetched over at 3 of the blogs on my blog bagel:

  • I.  On an error in Mark Chang’s treatment of my Birnbaum disproof on  Xi’an’s Og.
  • II. On Normal Deviant’s post offering “New Names For Statistical Methods”
  • III. On a statistics chapter in Nate Silver’s book, discussed over at Gelman’s blog.

You may find some of them, with links, on Rejected Posts.

Categories: Metablog, Rejected Posts, Statistics

13 well-worn criticisms of significance tests (and how to avoid them)

IMG_12432013 is right around the corner, and here are 13 well-known criticisms of statistical significance tests, and how they are addressed within the error statistical philosophy, as discussed in Mayo, D. G. and Spanos, A. (2011) “Error Statistics“.

  •  (#1) error statistical tools forbid using any background knowledge.
  •  (#2) All statistically significant results are treated the same.
  • (#3) The p-value does not tell us how large a discrepancy is found.
  • (#4) With large enough sample size even a trivially small discrepancy from the null can be detected.
  •  (#5) Whether there is a statistically significant difference from the null depends on which is the null and which is the alternative.
  • (#6) Statistically insignificant results are taken as evidence that the null hypothesis is true.
  • (#7) Error probabilities are misinterpreted as posterior probabilities.
  • (#8) Error statistical tests are justified only in cases where there is a very long (if not infinite) series of repetitions of the same experiment.
  • (#9) Specifying statistical tests is too arbitrary.
  • (#10) We should be doing confidence interval estimation rather than significance tests.
  • (#11) Error statistical methods take into account the intentions of the scientists analyzing the data.
  • (#12) All models are false anyway.
  • (#13) Testing assumptions involves illicit data-mining.

You can read how we avoid them in the full paper here.

Mayo, D. G. and Spanos, A. (2011) “Error Statistics” in Philosophy of Statistics , Handbook of Philosophy of Science Volume 7 Philosophy of Statistics, (General editors: Dov M. Gabbay, Paul Thagard and John Woods; Volume eds. Prasanta S. Bandyopadhyay and Malcolm R. Forster.) Elsevier: 1-46.

Categories: Error Statistics, significance tests, Statistics | Tags:

Msc kvetch: unfair but lawful discrimination (vs the irresistibly attractive)

Photo on 12-22-12 at 12.31 PMSee rejected posts.

Categories: Rejected Posts

Rejected Post: Clinical Trial Statistics Doomed by Mayan Apocalypse?

See Rejected Posts.17923-7DayForecast

Categories: Rejected Posts, Statistics

PhilStat/Law/Stock: more on “bad statistics”: Schachtman

NAS-3Nathan Schachtman has an update on the case of U.S. v. Harkonen discussed in my last 3 posts: here, here, and here.

United States of America v. W. Scott Harkonen, MD — Part III

Background

The recent oral argument in United States v. Harkonen (see “The (Clinical) Trial by Franz Kafka” (Dec. 11, 2012)), pushed me to revisit the brief filed by the Solicitor General’s office in Matrixx Initiatives Inc. v. Siracusano, 131 S. Ct. 1309 (2011).  One of Dr. Harkonen’s post-trial motions contended that the government’s failure to disclose its Matrixx amicus brief deprived him of a powerful argument that would have resulted from citing the language of the brief, which disparaged the necessity of statistical significance for “demonstrating” causal inferences. See “Multiplicity versus Duplicity – The Harkonen Conviction” (Dec. 11, 2012). Continue reading

Categories: PhilStatLaw, PhilStock, Statistics

PhilStat/Law/Stock: multiplicity and duplicity

Photo on 12-17-12 at 3.43 PMSo what’s the allegation that the prosecutors are being duplicitous about statistical evidence in the case discussed in my two previous (‘Bad Statistics’) posts? As a non-lawyer, I will ponder only the evidential (and not the criminal) issues involved.

“After the conviction, Dr. Harkonen’s counsel moved for a new trial on grounds of newly discovered evidence. Dr. Harkonen’s counsel hoisted the prosecutors with their own petards, by quoting the government’s amicus brief to the United States Supreme Court in Matrixx Initiatives Inc. v. Siracusano, 131 S. Ct. 1309 (2011).  In Matrixx, the securities fraud plaintiffs contended that they need not plead ‘statistically significant’ evidence for adverse drug effects.” (Schachtman’s part 2, ‘The Duplicity Problem – The Matrixx Motion’) 

The Matrixx case is another philstat/law/stock example taken up in this blog here, here, and here.  Why are the Harkonen prosecutors “hoisted with their own petards” (a great expression, by the way)? Continue reading

Categories: PhilStatLaw, PhilStock, Statistics | Tags:

PhilStat/Law (“Bad Statistics” Cont.)

As a philosopher of science and statistics, as well as a sometime trader in (those dangerous) biotech stocks, I realize that what is warranted inferentially need not follow what appears to be licensed/unlicensed by the straight and narrow path of officially sanctioned statistics. Understanding the background theories, history, detailed data, and assorted rulings are relevant for evidential grounds, which (despite what we might sometimes think) are rather different from legal grounds. (Grounds for stock trading decisions take one to yet a third and different world, but there are intersections). I only heard of the particular (Actimmune) episode mentioned in my previous blog entry from reading Schactman’s recent post[i], and have only a smattering of the background—some of which might shift the initial impressions of readers.  As I’m about to leave London (not even time for a pic), I’ll just post the controversial press release itself, posted on (Dr. Barbara Martin’s) website PATHOPHILIA[ii]:

INTERMUNE ANNOUNCES PHASE III DATA DEMONSTRATING SURVIVAL BENEFIT OF ACTIMMUNE IN IPF


—Reduces Mortality by 70% in Patients with Mild to Moderate Disease— Continue reading

Categories: PhilStatLaw

“Bad statistics”: crime or free speech?

wavy-capital3Hunting for “nominally” significant differences, trying different subgroups and multiple endpoints, can result in a much higher probability of erroneously inferring evidence of a risk or benefit than the nominal p-value, even in randomized controlled trials. This was an issue that arose in looking at RCTs in development economics (an area introduced to me by Nancy Cartwright), as at our symposium at the Philosophy of Science Association last month[i][ii]. Reporting the results of hunting and dredging in just the same way as if the relevant claims were predesignated can lead to misleading reports of actual significance levels.[iii]

Still, even if reporting spurious statistical results is considered “bad statistics,” is it criminal behavior? I noticed this issue in Nathan Schachtman’s blog over the past couple of days. The case concerns a biotech company, InterMune, and its previous CEO, Dr. Harkonen. Here’s an excerpt from Schachtman’s discussion (part 1). Continue reading

Categories: PhilStatLaw, significance tests, spurious p values, Statistics

Mayo on S. Senn: “How Can We Cultivate Senn’s-Ability?”–reblogs

Since Stephen Senn will be leading our seminar at the LSE tomorrow morning (see PH500 page), I’m reblogging my deconstruction of his paper (“You May Believe You Are a Bayesian But You Probably Are Wrong”) from Jan.15 2012 (though not his main topic tomorrow). At the end I link to other “U-Phils” on Senn’s paper (by Andrew Gelman, Andrew Jaffe, Christian Robert), Senn’s response, and my response to them). Queries, write me at: error@vt.edu

Mayo Philosophizes on Stephen Senn: “How Can We Cultivate Senn’s-Ability?”

Where's Mayo?

Where’s Mayo?

Although, in one sense, Senn’s remarks echo the passage of Jim Berger’s that we deconstructed a few weeks ago, Senn at the same time seems to reach an opposite conclusion. He points out how, in practice, people who claim to have carried out a (subjective) Bayesian analysis have actually done something very different—but that then they heap credit on the Bayesian ideal. (See also “Who Is Doing the Work?”)

“A very standard form of argument I do object to is the one frequently encountered in many applied Bayesian papers where the first paragraphs laud the Bayesian approach on various grounds, in particular its ability to synthesize all sources of information, and in the rest of the paper the authors assume that because they have used the Bayesian machinery of prior distributions and Bayes theorem they have therefore done a good analysis. It is this sort of author who believes that he or she is Bayesian but in practice is wrong.” (Senn 58) Continue reading

Categories: Bayesian/frequentist, U-Phil

Announcement: Prof. Stephen Senn to lead LSE grad seminar: 12-12-12

senncropped1Prof. Stephen Senn, Head of the Competences Center for Methodology and Statistics (CCMS), Luxembourg, will lead our graduate research seminar tomorrow, 12 December (London School of Economics 10-12 T 2.06. (see (LSE) PH500 page on the top of this blog, (background paper for seminar)):

“A statistician is one who prefers true doubts to false certainties.” (Senn)

Professor Senn has been the recipient of national and international awards, including the 1st  George C Challis award for Biostatistics at the University of Florida, and the Bradford Hill Medal of the Royal Statistical Society. He is the author of the monographs

Cross-over Trials in Clinical Research (1993, 2002),

Statistical Issues in Drug Development (1997, 2007)

Dicing with Death (2003)

Prof. Senn is a Fellow of the Royal Society of Edinburgh and an honorary life member of Statisticians in the Pharmaceutical Industry (PSI) and the International Society for Clinical Biostatistics (ISCB) and has an honorary chair in statistics at University College London .

Senn is also a monthly contributor to this blog on matters of philosophical foundations of statistics and methodology.  Here are some examples:

Stephen Senn: Fooling the Patient: an Unethical Use of Placebo? (Phil/Stat/Med)

Stephen Senn: Randomization, ratios and rationality: rescuing the randomized clinical trial from its critics

Stephen Senn: A Paradox of Prior Probabilities

Guest Blogger. STEPHEN SENN: Fisher’s alternative to the alternative

___________________________

Categories: Announcement

Don’t Birnbaumize that experiment my friend*–updated reblog

img_0196Our current topic, the strong likelihood principle (SLP), was recently mentioned by blogger Christian Robert (nice diagram). So ,since it’s Saturday night, and given the new law just passed in the state of Washington*, I’m going to reblog a post from Jan. 8, 2012, along with a new UPDATE (following a video we include as an experiment). The new material will be in red (slight differences in notation are explicated within links).

(A)  “It is not uncommon to see statistics texts argue that in frequentist theory one is faced with the following dilemma: either to deny the appropriateness of conditioning on the precision of the tool chosen by the toss of a coin[i], or else to embrace the strong likelihood principle which entails that frequentist sampling distributions are irrelevant to inference once the data are obtained.  This is a false dilemma … The ‘dilemma’ argument is therefore an illusion”. (Cox and Mayo 2010, p. 298)

The “illusion” stems from the sleight of hand I have been explaining in the Birnbaum argument—it starts with Birnbaumization. Continue reading

Categories: Birnbaum Brakes, Likelihood Principle, Statistics

Rejected post: Nov. Palindrome Winner: Kepler

See Thomas Kepler’s statement and palindrome.

Categories: Announcement

Announcement: U-Phil Extension: Blogging the Likelihood Principle

U-Phil: I am extending to Dec. 19, 2012 the date for sending me responses to the “U-Phil” call, see initial call, given some requests for more time. The details of the specific U-Phil may be found here, but you might also look at the post relating to my 28 Nov. seminar at the LSE, which is directly on the topic: the infamous (strong) likelihood principle (SLP). “U-Phil, ” which is short for “you ‘philosophize'” is really just an opportunity to write something .5-1 notch above an ordinary comment (focussed on one or more specific posts/papers, as described in each call): it can be longer (~500-1000 words), and it appears in the regular blog area rather than as a comment.  Your remarks can relate to the guest graduate student post by Gregory Gandenberger, and/or my discussion/argument. Graduate student posts (e.g., attendees of my 28 Nov. LSE seminar?) are especially welcome*. Earlier explemplars of U-Phils may be found here; and more by searching this blog.

Thanks to everyone who sent me names of vintage typewriter repair shops in London, after the airline damage: the “x” is fixed, but the “z” key is still misbehaving.

*Another post of possible relevance to graduate students comes up when searching this blog for  “sex”.

Categories: Announcement, Likelihood Principle, U-Phil

Mayo Commentary on Gelman & Robert

The following is my commentary on a paper by Gelman and Robertforthcoming (in early 2013) in the The American Statistician* (submitted October 3, 2012).

_______________________

mayo 2010 conference IphoneDiscussion of Gelman and Robert, “Not only defended but also applied”: The perceived absurdity of Bayesian inference”
Deborah G. Mayo

1. Introduction

I am grateful for the chance to comment on the paper by Gelman and Robert. I welcome seeing statisticians raise philosophical issues about statistical methods, and I entirely agree that methods not only should be applicable but also capable of being defended at a foundational level. “It is doubtful that even the most rabid anti-Bayesian of 2010 would claim that Bayesian inference cannot apply” (Gelman and Robert 2012, p. 6). This is clearly correct; in fact, it is not far off the mark to say that the majority of statistical applications nowadays are placed under the Bayesian umbrella, even though the goals and interpretations found there are extremely varied. There are a plethora of international societies, journals, post-docs, and prizes with “Bayesian” in their name, and a wealth of impressive new Bayesian textbooks and software is available. Even before the latest technical advances and the rise of “objective” Bayesian methods, leading statisticians were calling for eclecticism (e.g., Cox 1978), and most will claim to use a smattering of Bayesian and non-Bayesian methods, as appropriate. George Casella (to whom their paper is dedicated) and Roger Berger in their superb textbook (2002) exemplify a balanced approach. Continue reading

Categories: frequentist/Bayesian, Statistics

Statistical Science meets Philosophy of Science

2010 statsciphilsci conference logoMany of the discussions on this blog have revolved around a cluster of issues under the general question: “Statistical Science and Philosophy of Science: Where Do (Should) They meet? (in the contemporary landscape)?”  In tackling these issues, this blog regularly returns to a set of contributions growing out of a conference with the same title (June 2010, London School of Economics, Center for the Philosophy of Natural and Social Science, CPNSS), as well as to conversations initiated soon after. The conference site is here.  My most recent reflections in this arena (Sept. 26, 2012) are here. Continue reading

Categories: Statistics, StatSci meets PhilSci

Normal Deviate’s blog on false discovery rates

There is an interesting guest post by Ryan Tibshirani on the Normal Deviate’s blog comparing the False Discovery Rates (FDR)* associated with different methods of screening for potentially interesting genes (based on p-value assessments). I want to come to this at some point.

*FDR = E(number of null genes called significant/number of genes called significant)

Categories: Announcement

Blog at WordPress.com.