Aris Spanos’ overview of error statistical responses to familiar criticisms of statistical tests. Related reading is Mayo and Spanos (2011)
Winner of April 2014 Palindrome Contest:
Pose ad: ‘Elba fallacy amid aged? Amygdala error or real?’ Ad: gym ad? Egad! I may call a fabled Aesop.
The requirement: A palindrome with Elba plus “fallacy” with an optional second word: “error”. A palindrome using both topped an acceptable palindrome using only “fallacy”. All April submissions used both. Other April finalists are here.
Lori Wike is principal bassoonist of the Utah Symphony and is on the faculty of the University of Utah and Westminster College. She holds a Bachelor of Music degree from the Eastman School of Music and a Master of Arts degree in Comparative Literature from UC-Irvine.
“What ever happened to Bayesian foundations?” was one of the final topics of our seminar (Mayo/SpanosPhil6334). In the past 15 years or so, not only have (some? most?) Bayesians come to accept violations of the Likelihood Principle, they have also tended to disown Dutch Book arguments, and the very idea of inductive inference as updating beliefs by Bayesian conditionalization has evanescencd. In one of Thursday’s readings, by Baccus, Kyburg, and Thalos (1990), it is argued that under certain conditions, it is never a rational course of action to change belief by Bayesian conditionalization. Here’s a short snippet for your Saturday night reading (the full paper is http://errorstatistics.files.wordpress.com/2014/05/bacchus_kyburg_thalos-against-conditionalization.pdf): Continue reading
Putting the brakes on the breakthrough: An informal look at the argument for the Likelihood Principle
Friday, May 2, 2014, I will attempt to present my critical analysis of the Birnbaum argument for the (strong) Likelihood Principle, so as to be accessible to a general philosophy audience (flyer below). Can it be done? I don’t know yet, this is a first. It will consist of:
- Example 1: Trying and Trying Again: Optional stopping
- Example 2: Two instruments with different precisions
[you shouldn’t get credit (or blame) for something you didn’t do]
- The Breakthough: Birnbaumization
- Imaginary dialogue with Allan Birnbaum
The full paper is here. My discussion takes several pieces a reader can explore further by searching this blog (e.g., under SLP, brakes e.g., here, Birnbaum, optional stopping). I will post slides afterwards.
It’s good to know that in this incredibly stressed month[i], as we deal with end of semester deadlines, exams, applications and whatnot, that some people have found time for the errorstatistics palindrome contest–in fact, it’s the first time ever that I’ve received three (quite good) candidates (below)! (Help the Elba judges by voting for 1-3, firstname.lastname@example.org) Continue reading
Reliability and Reproducibility: Fraudulent p-values through multiple testing (and other biases): S. Stanley Young (Phil 6334: Day#13)
Here are Dr. Stanley Young’s slides from our April 25 seminar. They contain several tips for unearthing deception by fraudulent p-value reports. Since it’s Saturday night, you might wish to perform an experiment with three 10-sided dice*,recording the results of 100 rolls (3 at a time) on the form on slide 13. An entry, e.g., (0,1,3) becomes an imaginary p-value of .013 associated with the type of tumor, male-female, old-young. You report only hypotheses whose null is rejected at a “p-value” less than .05. Forward your results to me for publication in a peer-reviewed journal.
*Sets of 10-sided dice will be offered as a palindrome prize beginning in May.
We are pleased to announce our guest speaker at Thursday’s seminar (April 24, 2014): “Statistics and Scientific Integrity”:
Author of Resampling-Based Multiple Testing, Westfall and Young (1993) Wiley.
The main readings for the discussion are:
- Young, S. & Karr, A. (2011). Deming, Data and Observational Studies. Signif. 8 (3), 116–120.
- Begley & Ellis (2012) Raise standards for preclinical cancer research. Nature 483: 531-533.
- Ioannidis (2005). Why most published research ﬁndings are false. PLoS Med 2(8): e124.
- Peng, R. D., Dominici, F. & Zeger, S. L. (2006). “Reproducible Epidemiologic Research” American Journal of Epidemiology 163 (9), 783-789.
We interspersed key issues from the reading for this session (from Howson and Urbach) with portions of my presentation at the Boston Colloquium (Feb, 2014): Revisiting the Foundations of Statistics in the Era of Big Data: Scaling Up to Meet the Challenge. (Slides below)*.
Someone sent us a recording (mp3)of the panel discussion from that Colloquium (there’s a lot on “big data” and its politics) including: Mayo, Xiao-Li Meng (Harvard), Kent Staley (St. Louis), and Mark van der Laan (Berkeley).
See if this works: | mp3
*There’s a prelude here to our visitor on April 24: Professor Stanley Young from the National Institute of Statistical Sciences.
Four years ago, many of us were glued to the “spill cam” showing, in real time, the gushing oil from the April 20, 2010 explosion sinking the Deepwater Horizon oil rig in the Gulf of Mexico, killing 11, and spewing oil until July 15 (see video clip that was added below).Remember junk shots, top kill, blowout preventers?  The EPA has lifted its gulf drilling ban on BP just a couple of weeks ago* (BP has paid around
$13 $27 billion in fines and compensation), and April 20, 2014, is the deadline to properly file forms for new compensations.
(*After which BP had another small spill in Lake Michigan.)
But what happened to the 200 million gallons of oil? Has it vanished or just sunk to the bottom of the sea by dispersants which may have caused hidden destruction of sea life? I don’t know, but given it’s Saturday night, let’s listen in to a reblog of a spill-related variation on the second of two original “overheard at the comedy hour” jokes.
A question came up in our seminar today about how to understand the duality between a simple one-sided test and a lower limit (LL) of a corresponding 1-sided confidence interval estimate. This is also a good route to SEV (i.e., severity). Here’s a quick answer: Continue reading
Jerzy Neyman (April 16, 1894 – August 5, 1981), was a Polish/American statistician[i] who spent most of his professional career at the University of California, Berkeley. Neyman is best known in statistics for his pioneering contributions in framing the Neyman-Pearson (N-P) optimal theory of hypothesis testing and his theory of Confidence Intervals. (This article was first posted here.)
One of Neyman’s most remarkable, but least recognized, achievements was his adapting of Fisher’s (1922) notion of a statistical model to render it pertinent for non-random samples. Continue reading
A. Spanos Probability/Statistics Lecture Notes 7: An Introduction to Bayesian Inference (4/10/14)
“There was a vain and ambitious hospital director. A bad statistician. ..There were good medics and bad medics, good nurses and bad nurses, good cops and bad cops … Apparently, even some people in the Public Prosecution service found the witch hunt deeply disturbing.”
This is how Richard Gill, statistician at Leiden University, describes a feature film (Lucia de B.) just released about the case of Lucia de Berk, a nurse found guilty of several murders based largely on statistics. Gill is widely-known (among other things) for showing the flawed statistical analysis used to convict her, which ultimately led (after Gill’s tireless efforts) to her conviction being revoked. (I hope they translate the film into English.) In a recent e-mail Gill writes:
“The Dutch are going into an orgy of feel-good tear-jerking sentimentality as a movie comes out (the premiere is tonight) about the case. It will be a good movie, actually, but it only tells one side of the story. …When a jumbo jet goes down we find out what went wrong and prevent it from happening again. The Lucia case was a similar disaster. But no one even *knows* what went wrong. It can happen again tomorrow.
I spoke about it a couple of days ago at a TEDx event (Flanders).
You can find some p-values in my slides ["Murder by Numbers", pasted below the video]. They were important – first in convicting Lucia, later in getting her a fair re-trial.”
Since it’s Saturday night, let’s watch Gill’s TEDx talk, “Statistical Error in court”.
Slides from the Talk: “Murder by Numbers”:
We were reading “Out, Damned Spot: Can the ‘Macbeth effect’ be replicated?” (Earp,B., Everett,J., Madva,E., and Hamlin,J. 2014, in Basic and Applied Social Psychology 36: 91-8) in an informal gathering of our 6334 seminar yesterday afternoon at Thebes. Some of the graduate students are interested in so-called “experimental” philosophy, and I asked for an example that used statistics for purposes of analysis. The example–and it’s a great one (thanks Rory M!)–revolves around priming research in social psychology. Yes the field that has come in for so much criticism as of late, especially after Diederik Stapel was found to have been fabricating data altogether (search this blog, e.g., here). Continue reading
April 3, 2014: We interspersed discussion with slides; these cover the main readings of the day (check syllabus): the Duhem’s Probem and the Bayesian Way, and “Highly probable vs Highly Probed”. syllabus four. Slides are below (followers of this blog will be familiar with most of this, e.g., here). We also did further work on misspecification testing.
Monday, April 7, is an optional outing, “a seminar class trip”
you might say, here at Thebes at which time we will analyze the statistical curves of the mountains, pie charts of pizza, and (seriously) study some experiments on the problem of replication in “the Hamlet Effect in social psychology”. If you’re around please bop in!
Mayo’s slides on Duhem’s Problem and more from April 3 (Day#9):
It was from my Virginia Tech colleague I.J. Good (in statistics), who died five years ago (April 5, 2009), at 93, that I learned most of what I call “howlers” on this blog. His favorites were based on the “paradoxes” of stopping rules. (I had posted this last year here.)
“In conversation I have emphasized to other statisticians, starting in 1950, that, in virtue of the ‘law of the iterated logarithm,’ by optional stopping an arbitrarily high sigmage, and therefore an arbitrarily small tail-area probability, can be attained even when the null hypothesis is true. In other words if a Fisherian is prepared to use optional stopping (which usually he is not) he can be sure of rejecting a true null hypothesis provided that he is prepared to go on sampling for a long time. The way I usually express this ‘paradox’ is that a Fisherian [but not a Bayesian] can cheat by pretending he has a plane to catch like a gambler who leaves the table when he is ahead” (Good 1983, 135) [*]
Skeptical and enthusiastic Bayesian priors for beliefs about insane asylum renovations at Dept of Homeland Security: I’m skeptical and unenthusiastic
I had heard of medical designs that employ individuals who supply Bayesian subjective priors that are deemed either “enthusiastic” or “skeptical” as regards the probable value of medical treatments.[i] From what I gather, these priors are combined with data from trials in order to help decide whether to stop trials early or continue. But I’d never heard of these Bayesian designs in relation to decisions about building security or renovations! Listen to this…. Continue reading
[Other slides from Day 9 by guest, John Byrd, can be found here.]