I’m giving a Popper talk at the London School of Economics next Tuesday (10 May). If you’re in the neighborhood, I hope you’ll stop by.
A somewhat accurate blurb is here. I say “somewhat” because it doesn’t mention that I’ll talk a bit about the replication crisis in psychology, and the issues that crop up (or ought to) in connecting statistical results and the causal claim of interest.
Application deadline: May 8, 2016
PHILOSOPHY & PHYSICAL COMPUTING
JULY 11-24, 2016 at Virginia Tech
Who should apply:
- This workshop is open to graduate students in master’s or PhD programs in philosophy or the sciences, including computer science.
For additional information or to apply online, visit thinkandcode.vtlibraries.org, or contact Dr. Benjamin Jantzen at firstname.lastname@example.org
My “April 1” posts for the past 5 years have been so close to the truth or possible truth that they weren’t always spotted as April Fool’s pranks, which is what made them genuine April Fool’s pranks. (After a few days I labeled them as such, or revealed it in a comment). So since it’s Saturday night on the last night of April, I’m reblogging my 5 posts from first days of April. (Which fooled you the most?) Continue reading
3 years ago…
MONTHLY MEMORY LANE: 3 years ago–March & April 2013. I missed March memory lane, so both are combined here. I mark in red three posts most apt for a general background on key issues in this blog . I’ve added some remarks in blue this month, for some of the posts that are not marked in red.
- (3/1) capitalizing on chance-Worth a look (has a pic of Mayo gambling)!
- (3/4) Big Data or Pig Data?–Funny & clever(guest post)!
- (3/7) Stephen Senn: Casting Stones
- (3/10) Blog Contents 2013 (Jan & Feb)
- (3/11) S. Stanley Young: Scientific Integrity and Transparency
- (3/13) Risk-Based Security: Knives and Axes-Funny, strange!
- (3/15) Normal Deviate: Double Misunderstandings About p-values–worth keeping in mind.
- (3/17) Update on Higgs data analysis: statistical flukes (1)
- (3/21) Telling the public why the Higgs particle matters
- (3/23) Is NASA suspending public education and outreach?
- (3/27) Higgs analysis and statistical flukes (part 2)
- (3/31) possible progress on the comedy hour circuit?–One of my favorites, a bit of progress
Neyman April 16, 1894 – August 5, 1981
In honor of Jerzy Neyman’s birthday today, a local acting group is putting on a short theater production based on a screenplay I wrote: “Les Miserables Citations” (“Those Miserable Quotes”) . The “miserable” citations are those everyone loves to cite, from their early joint 1933 paper:
We are inclined to think that as far as a particular hypothesis is concerned, no test based upon the theory of probability can by itself provide any valuable evidence of the truth or falsehood of that hypothesis.
But we may look at the purpose of tests from another viewpoint. Without hoping to know whether each separate hypothesis is true or false, we may search for rules to govern our behavior with regard to them, in following which we insure that, in the long run of experience, we shall not be too often wrong. (Neyman and Pearson 1933, pp. 290-1).
I’m about to hear Jim Berger give a keynote talk this afternoon at a FUSION conference I’m attending. The conference goal is to link Bayesian, frequentist and fiducial approaches: BFF. (Program is here. See the blurb below ). April 12 update below*. Berger always has novel and intriguing approaches to testing, so I was especially curious about the new measure. It’s based on a 2016 paper by Bayarri, Benjamin, Berger, and Sellke (BBBS 2016): Rejection Odds and Rejection Ratios: A Proposal for Statistical Practice in Testing Hypotheses. They recommend:
“that researchers should report what we call the ‘pre-experimental rejection ratio’ when presenting their experimental design and researchers should report what we call the ‘post-experimental rejection ratio’ (or Bayes factor) when presenting their experimental results.” (BBBS 2016)….
“The (pre-experimental) ‘rejection ratio’ Rpre , the ratio of statistical power to significance threshold (i.e., the ratio of the probability of rejecting under H1 and H0 respectively), is shown to capture the strength of evidence in the experiment for H1 over H0 .”
If you’re seeking a comparative probabilist measure, the ratio of power/size can look like a likelihood ratio in favor of the alternative. To a practicing member of an error statistical tribe, however, whether along the lines of N, P, or F (Neyman, Pearson or Fisher), things can look topsy turvy. Continue reading
Manan Shah channels Jack Nicholson in “The Shining” to win this month’s palindrome contest (and the book of his choice).*
Winner of March 2016 Contest: Manan Shah
Palindrome: I was able to. I did add well. Liking is, I say, as evil as dad’s aloof. Delivery reviled sign: “I red rum”. Examine men I’m axe murdering. Is delivery reviled? Fool! As dad’s alive, say as I sign: “I kill lewd dad.” Idiot Elba saw I.
The requirements: In addition to using Elba, a candidate for a winning palindrome must have used examine (or examined or examination).
Bio: Manan Shah is a mathematician and owner of Think. Plan. Do. LLC (www.ThinkPlanDoLLC.com). He writes at www.mathmisery.com and is looking to publish his first book, hopefully by the end of this year. He holds a PhD in Mathematics from Florida State University.
I’ll be speaking at U of Minnesota tomorrow. I’m glad to see a group with interest in philosophical foundations of statistics as well as the foundations of experiment and measurement in psychology. I will post my slides afterwards. Come by if you’re in the neighborhood.
University of Minnesota
“The ASA (2016) Statement on P-values and
How to Stop Refighting the Statistics Wars”
April 8, 2016 at 3:35 p.m.
Deborah G. Mayo
Department of Philosophy, Virginia Tech
The CLA Quantitative Methods
Minnesota Center for Philosophy of Science
275 Nicholson Hall
216 Pillsbury Drive SE
University of Minnesota
This will be a mixture of my current take on the “statistics wars” together with my reflections on the recent ASA document on P-values. I was invited over a year ago already by Niels Waller, a co-author of Paul Meehl. I’ll never forget when I was there in 1997: Paul Meehl was in the audience, waving my book in the air–EGEK (1996)–and smiling!
I could have told them that the degree of accordance enabling the “6 principles” on p-values was unlikely to be replicated when it came to most of the “other approaches” with which some would supplement or replace significance tests– notably Bayesian updating, Bayes factors, or likelihood ratios (confidence intervals are dual to hypotheses tests). [My commentary is here.] So now they may be advising a “hold off” or “go slow” approach until some consilience is achieved. Is that it? I don’t know. I was tweeted an article about the background chatter taking place behind the scenes; I wasn’t one of people interviewed for this. Here are some excerpts, I may add more later after it has had time to sink in. (check back later)
“Reaching for Best Practices in Statistics: Proceed with Caution Until a Balanced Critique Is In”
“[A]ll of the other approaches*, as well as most statistical tools, may suffer from many of the same problems as the p-values do. What level of likelihood ratio in favor of the research hypothesis will be acceptable to the journal? Should scientific discoveries be based on whether posterior odds pass a specific threshold (P3)? Does either measure the size of an effect (P5)?…How can we decide about the sample size needed for a clinical trial—however analyzed—if we do not set a specific bright-line decision rule? 95% confidence intervals or credence intervals…offer no protection against selection when only those that do not cover 0, are selected into the abstract (P4). (Benjamini, ASA commentary, pp. 3-4)
What’s sauce for the goose is sauce for the gander right? Many statisticians seconded George Cobb who urged “the board to set aside time at least once every year to consider the potential value of similar statements” to the recent ASA p-value report. Disappointingly, a preliminary survey of leaders in statistics, many from the original p-value group, aired striking disagreements on best and worst practices with respect to these other approaches. The Executive Board is contemplating a variety of recommendations, minimally, Continue reading
Given all the recent attention given to kvetching about significance tests, it’s an apt time to reblog Aris Spanos’ overview of the error statistician talking back to the critics . A related paper for your Saturday night reading is Mayo and Spanos (2011). It mixes the error statistical philosophy of science with its philosophy of statistics, introduces severity, and responds to 13 criticisms and howlers.
I’m going to comment on some of the ASA discussion contributions I hadn’t discussed earlier. Please share your thoughts in relation to any of this.
It was first blogged here, as part of our seminar 2 years ago.
 For those seeking a bit more balance to the main menu offered in the ASA Statistical Significance Reference list.
See also on this blog:
A. Spanos, “Recurring controversies about p-values and confidence intervals revisited”
A. Spanos, “Lecture on frequentist hypothesis testing
Comments get unwieldy after 100, so here’s a chance to continue the “due to chance” discussion in some roomier quarters. (There seems to be at least two distinct lanes being travelled.) Now one of the main reasons I run this blog is to discover potential clues to solving or making progress on thorny philosophical problems I’ve been wrangling with for a long time. I think I extracted some illuminating gems from the discussion here, but I don’t have time to write them up, and won’t for a bit, so I’ve parked a list of comments wherein the golden extracts lie (I think) over at my Rejected Posts blog. (They’re all my comments, but as influenced by readers, so I thank you!) Over there, there’s no “return and resubmit”, but around a dozen posts have eventually made it over here, tidied up. Please continue the discussion on this blog (I don’t even recommend going over there). You can link to your earlier comments by clicking on the date.
 The Spiegelhalter (PVP) link is here.
There’s something about “Principle 2” in the ASA document on p-values that I couldn’t address in my brief commentary, but is worth examining more closely.
2. P-values do not measure (a) the probability that the studied hypothesis is true , or (b) the probability that the data were produced by random chance alone,
(a) is true, but what about (b)? That’s what I’m going to focus on, because I think it is often misunderstood. It was discussed earlier on this blog in relation to the Higgs experiments and deconstructing “the probability the results are ‘statistical flukes'”. So let’s examine: Continue reading
My invited comments on the ASA Document on P-values*
The American Statistical Association is to be credited with opening up a discussion into p-values; now an examination of the foundations of other key statistical concepts is needed.
Statistical significance tests are a small part of a rich set of “techniques for systematically appraising and bounding the probabilities (under respective hypotheses) of seriously misleading interpretations of data” (Birnbaum 1970, p. 1033). These may be called error statistical methods (or sampling theory). The error statistical methodology supplies what Birnbaum called the “one rock in a shifting scene” (ibid.) in statistical thinking and practice. Misinterpretations and abuses of tests, warned against by the very founders of the tools, shouldn’t be the basis for supplanting them with methods unable or less able to assess, control, and alert us to erroneous interpretations of data. Continue reading
unscrambling soap words clears me of this deed (aosp)
Remember “Repligate”? [“Some Ironies in the Replication Crisis in Social Psychology“] and, more recently, the much publicized attempt to replicate 100 published psychology articles by the Open Science Collaboration (OSC) [“The Paradox of Replication“]? Well, some of the critics involved in Repligate have just come out with a criticism of the OSC results, claiming they’re way, way off in their low estimate of replications in psychology . (The original OSC report is here.) I’ve only scanned the critical article quickly, but some bizarre statistical claims leap out at once. (Where do they get this notion about confidence intervals?) It’s published in Science! There’s also a response from the OSC researchers. Neither group adequately scrutinizes the validity of many of the artificial experiments and proxy variables–an issue I’ve been on about for a while. Without firming up the statistics-research link, no statistical fixes can help. I’m linking to the articles here for your weekend reading. I invite your comments! For some reason a whole bunch of items of interest, under the banner of “statistics and the replication crisis,” are all coming out at around the same time, and who can keep up? March 7 brings yet more! (Stay tuned). Continue reading
Statistical Challenges in Assessing and Fostering the Reproducibility of Scientific Results
I generally find National Academy of Science (NAS) manifestos highly informative. I only gave a quick reading to around 3/4 of this one. I thank Hilda Bastian for twittering the link. Before giving my impressions, I’m interested to hear what readers think, whenever you get around to having a look. Here’s from the intro*:
Questions about the reproducibility of scientific research have been raised in numerous settings and have gained visibility through several high-profile journal and popular press articles. Quantitative issues contributing to reproducibility challenges have been considered (including improper data management and analysis, inadequate statistical expertise, and incomplete data, among others), but there is no clear consensus on how best to approach or to minimize these problems…
3 years ago…
MONTHLY MEMORY LANE: 3 years ago: February 2013. I mark in red three posts that seem most apt for general background on key issues in this blog . Posts that are part of a “unit” or a group of “U-Phils”(you [readers] philosophize) count as one. Feb. 2013 reminds me how much the issue of the Likelihood Principle figured in this blog. I group the 4 on the Likelihood Principle, in burgundy, as one. Those unaware of the issue, or updating a statistics text in the next few months, might want to see what all the hoopla is about. (For the latest, please see ). The three in green are on Fisher. New questions or comments on any posts can be placed on this post.
- (2/2) U-Phil: Ton o’ Bricks
- (2/4) January Palindrome Winner
- (2/6) Mark Chang (now) gets it right about circularity
- (2/8) From Gelman’s blog: philosophy and the practice of Bayesian statistics
- (2/9) New kvetch: Filly Fury
- (2/10) U-PHIL: Gandenberger & Hennig: Blogging Birnbaum’s Proof
- (2/11) U-Phil: Mayo’s response to Hennig and Gandenberger
- (2/13) Statistics as a Counter to Heavyweights…who wrote this?
- (2/16) Fisher and Neyman after anger management?
- (2/17) R. A. Fisher: how an outsider revolutionized statistics
- (2/20) Fisher: from ‘Two New Properties of Mathematical Likelihood’
- (2/23) Stephen Senn: Also Smith and Jones
- (2/26) PhilStock: DO < $70
- (2/26) Statistically speaking…
 I exclude those reblogged fairly recently. Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.
 The discussion culminated in this publication in Statistical Science. For a very informal, final, look, see this post.
This continues my previous post: “Can’t take the fiducial out of Fisher…” in recognition of Fisher’s birthday, February 17. I supply a few more intriguing articles you may find enlightening to read and/or reread on a Saturday night
Move up 20 years to the famous 1955/56 exchange between Fisher and Neyman. Fisher clearly connects Neyman’s adoption of a behavioristic-performance formulation to his denying the soundness of fiducial inference. When “Neyman denies the existence of inductive reasoning, he is merely expressing a verbal preference. For him ‘reasoning’ means what ‘deductive reasoning’ means to others.” (Fisher 1955, p. 74).
Fisher was right that Neyman’s calling the outputs of statistical inferences “actions” merely expressed Neyman’s preferred way of talking. Nothing earth-shaking turns on the choice to dub every inference “an act of making an inference”.[i] The “rationality” or “merit” goes into the rule. Neyman, much like Popper, had a good reason for drawing a bright red line between his use of probability (for corroboration or probativeness) and its use by ‘probabilists’ (who assign probability to hypotheses). Fisher’s Fiducial probability was in danger of blurring this very distinction. Popper said, and Neyman would have agreed, that he had no problem with our using the word induction so long it was kept clear it meant testing hypotheses severely. Continue reading
R.A. Fisher: February 17, 1890 – July 29, 1962
In recognition of R.A. Fisher’s birthday today, I’ve decided to share some thoughts on a topic that has so far has been absent from this blog: Fisher’s fiducial probability. Happy Birthday Fisher.
[Neyman and Pearson] “began an influential collaboration initially designed primarily, it would seem to clarify Fisher’s writing. This led to their theory of testing hypotheses and to Neyman’s development of confidence intervals, aiming to clarify Fisher’s idea of fiducial intervals (D.R.Cox, 2006, p. 195).
The entire episode of fiducial probability is fraught with minefields. Many say it was Fisher’s biggest blunder; others suggest it still hasn’t been understood. The majority of discussions omit the side trip to the Fiducial Forest altogether, finding the surrounding brambles too thorny to penetrate. Besides, a fascinating narrative about the Fisher-Neyman-Pearson divide has managed to bloom and grow while steering clear of fiducial probability–never mind that it remained a centerpiece of Fisher’s statistical philosophy. I now think that this is a mistake. It was thought, following Lehman (1993) and others, that we could take the fiducial out of Fisher and still understand the core of the Neyman-Pearson vs Fisher (or Neyman vs Fisher) disagreements. We can’t. Quite aside from the intrinsic interest in correcting the “he said/he said” of these statisticians, the issue is intimately bound up with the current (flawed) consensus view of frequentist error statistics.
So what’s fiducial inference? I follow Cox (2006), adapting for the case of the lower limit: Continue reading
Polling estimation & rubbing off
Nate Silver describes “How we’re forecasting the primaries” using confidence intervals. Never mind that the estimates are a few weeks old, and put entirely to one side any predictions he makes or will make. I’m only interested in this one interpretive portion of the method, as Silver describes it:
In our interactive, you’ll see a bunch of funky-looking curves like the ones below for each candidate; they represent the model’s estimate of the possible distribution of his vote share. The red part of the curve represents a candidate’s 80 percent confidence interval. If the model is calibrated correctly, then he should finish within this range 80 percent of the time, above it 10 percent of the time, and below it 10 percent of the time. (My emphasis.)
OK. We look up the link to confidence interval. Continue reading