3 years ago…
MONTHLY MEMORY LANE: 3 years ago: January 2013. I mark in red three posts that seem most apt for general background on key issues in this blog . Posts that are part of a “unit” or a group of “U-Phils”(you [readers] philosophize) count as one. It was tough to pick just 3 this month. I’m putting the 2 “U-Phils” in burgundy–nearly red. They involve reader contributions on the likelihood principle–a major topic in foundations of statistics. Please check out the others. New questions or comments can be placed on this post.
- (1/2) Severity as a ‘Metastatistical’ Assessment
- (1/4) Severity Calculator
- (1/6) Guest post: Bad Pharma? (S. Senn)
- (1/9) RCTs, skeptics, and evidence-based policy
- (1/10) James M. Buchanan
- (1/11) Aris Spanos: James M. Buchanan: a scholar, teacher and friend
- (1/12) Error Statistics Blog: Table of Contents
- (1/15) Ontology & Methodology: Second call for Abstracts, Papers
- (1/18) New Kvetch/PhilStock
- (1/19) Saturday Night Brainstorming and Task Forces: (2013) TFSI on NHST (2015 update).
- (1/22) New PhilStock
- (1/23) P-values as posterior odds?
- (1/26) Coming up: December U-Phil Contributions….
- (1/27) U-Phil: S. Fletcher & N.Jinn
- (1/30) U-Phil: J. A. Miller: Blogging the SLP
 I exclude those reblogged fairly recently. Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.
When they sought to subject Uri Geller to the scrutiny of scientists, magicians had to be brought in because only they were sufficiently trained to spot the subtle sleight of hand shifts by which the magician tricks by misdirection. We, too, have to be magicians to discern the subtle misdirections and shifts of meaning in the discussions of statistical significance tests (and other methods)—even by the same statistical guide. We needn’t suppose anything deliberately devious is going on at all! Often, the statistical guidebook reflects shifts of meaning that grow out of one or another critical argument. These days, they trickle down quickly to statistical guidebooks, thanks to popular articles on the “statistics crisis in science”. The danger is that their own guidebooks contain inconsistencies. To adopt the magician’s stance is to be on the lookout for standard sleights of hand. There aren’t that many.
I don’t know Jim Frost, but he gives statistical guidance at the minitab blog. The purpose of my previous post is to point out that Frost uses the probability of a Type I error in two incompatible ways in his posts on significance tests. I assumed he’d want to clear this up, but so far he has not. His response to a comment I made on his blog is this: Continue reading
waiting for the other shoe to drop…
Do you ever find yourself holding your breath when reading an exposition of significance tests that’s going swimmingly so far? If you’re a frequentist in exile, you know what I mean. I’m sure others feel this way too. When I came across Jim Frost’s posts on The Minitab Blog, I thought I might actually have located a success story. He does a good job explaining P-values (with charts), the duality between P-values and confidence levels, and even rebuts the latest “test ban” (the “Don’t Ask, Don’t Tell” policy). Mere descriptive reports of observed differences that the editors recommend, Frost shows, are uninterpretable without a corresponding P-value or the equivalent. So far, so good. I have only small quibbles, such as the use of “likelihood” when meaning probability, and various and sundry nitpicky things. But watch how in some places significance levels are defined as the usual error probabilities and error rates—indeed in the glossary for the site—while in others it is denied they provide error rates. In those other places, error probabilities and error rates shift their meaning to posterior probabilities, based on priors representing the “prevalence” of true null hypotheses. Continue reading
The allegation that P-values overstate the evidence against the null hypothesis continues to be taken as gospel in discussions of significance tests. All such discussions, however, assume a notion of “evidence” that’s at odds with significance tests–generally likelihood ratios, or Bayesian posterior probabilities (conventional or of the “I’m selecting hypotheses from an urn of nulls” variety). I’m reblogging the bulk of an earlier post as background for a new post to appear tomorrow. It’s not that a single small P-value provides good evidence of a discrepancy (even assuming the model, and no biasing selection effects); Fisher and others warned against over-interpreting an “isolated” small P-value long ago. The problem is that the current formulation of the “P-values overstate the evidence” meme is attached to a sleight of hand (on meanings) that is introducing brand new misinterpretations into an already confused literature!
1. What you should ask…
When you hear the familiar refrain, “We all know that P-values overstate the evidence against the null hypothesis”, denying the P-value aptly measures evidence, what you should ask is:
“What do you mean by overstating the evidence against a hypothesis?”
One honest answer is:
“What I mean is that when I put a lump of prior probability π0 > 1/2 on a point null H0 (or a very small interval around it), the P-value is smaller than my Bayesian posterior probability on H0.”
Your reply might then be: (a) P-values are not intended as posteriors in H0 and (b) P-values can be used to determine whether there is evidence of inconsistency with a null hypothesis at various levels, and to distinguish how well or poorly tested claims are–depending on the type of question asked. A report on the discrepancies “poorly” warranted is what controls any overstatements about discrepancies indicated.
You might toss in the query: Why do you assume that “the” correct measure of evidence (for scrutinizing the P-value) is via the Bayesian posterior?
If you wanted to go even further you might rightly ask: And by the way, what warrants your lump of prior to the null? (See Section 3. A Dialogue.) Continue reading
The record number of hits on this blog goes to “When Bayesian Inference shatters,” where Houman Owhadi presents a “Plain Jane” explanation of results now published in “On the Brittleness of Bayesian Inference”. A follow-up was 1 year ago. Here’s how their paper begins:
Professor of Applied and Computational Mathematics and Control and Dynamical Systems, Computing + Mathematical Sciences,
California Institute of Technology, USA+
Computing + Mathematical Sciences,
California Institute of Technology, USA
“On the Brittleness of Bayesian Inference”
ABSTRACT: With the advent of high-performance computing, Bayesian methods are becoming increasingly popular tools for the quantification of uncertainty throughout science and industry. Since these methods can impact the making of sometimes critical decisions in increasingly complicated contexts, the sensitivity of their posterior conclusions with respect to the underlying models and prior beliefs is a pressing question to which there currently exist positive and negative answers. We report new results suggesting that, although Bayesian methods are robust when the number of possible outcomes is finite or when only a finite number of marginals of the data-generating distribution are unknown, they could be generically brittle when applied to continuous systems (and their discretizations) with finite information on the data-generating distribution. If closeness is defined in terms of the total variation (TV) metric or the matching of a finite system of generalized moments, then (1) two practitioners who use arbitrarily close models and observe the same (possibly arbitrarily large amount of) data may reach opposite conclusions; and (2) any given prior and model can be slightly perturbed to achieve any desired posterior conclusion. The mechanism causing brittleness/robustness suggests that learning and robustness are antagonistic requirements, which raises the possibility of a missing stability condition when using Bayesian inference in a continuous world under finite information.
© 2015, Society for Industrial and Applied Mathematics
Permalink: http://dx.doi.org/10.1137/130938633 Continue reading
Winner of the December 2015 Palindrome contest
Mike Jacovides: Associate Professor of Philosophy at Purdue University
Palindrome: Emo, notable Stacy began a memory by Rome. Manage by cats, Elba to Nome.
The requirement: A palindrome using “memory” or “memories” (and Elba, of course).
Book choice (out of 12 or more): Error and the Growth of Experimental Knowledge (D. Mayo 1996, Chicago)
Bio: Mike Jacovides is an Associate Professor of Philosophy at Purdue University. He’s just finishing a book whose title is constantly changing, but which may end up being called Locke’s Image of the World and the Scientific Revolution.
Statement: My interest in palindromes was sparked by my desire to learn more about the philosophy of statistics. The fact that you can learn about the philosophy of statistics by writing a palindrome seems like evidence that anything can cause anything, but maybe once I read the book, I’ll learn that it isn’t. I am glad that ‘emo, notable Stacy’ worked out, I have to say.
Congratulations Mike! I hope you’ll continue to pursue philosophy of statistics! We need much more of that. Good choice of book prize too. D. Mayo Continue reading
David Mellor, from the Center for Open Science, emailed me asking if I’d announce his Preregistration Challenge on my blog, and I’m glad to do so. You win $1,000 if your properly preregistered paper is published. The recent replication effort in psychology showed, despite the common refrain – “it’s too easy to get low P-values” – that in preregistered replication attempts it’s actually very difficult to get small P-values. (I call this the “paradox of replication”.) Here’s our e-mail exchange from this morning:
Dear Deborah Mayod,
I’m reaching out to individuals who I think may be interested in our recently launched competition, the Preregistration Challenge (https://cos.io/prereg). Based on your blogging, I thought it could be of interest to you and to your readers.
In case you are unfamiliar with it, preregistration specifies in advance the precise study protocols and analytical decisions before data collection, in order to separate the hypothesis-generating exploratory work from the hypothesis testing confirmatory work.
Though required by law in clinical trials, it is virtually unknown within the basic sciences. We are trying to encourage this new behavior by offering 1,000 researchers $1000 prizes for publishing the results of their preregistered work.
Please let me know if this is something you would consider blogging about or sharing in other ways. I am happy to discuss further.
David Mellor, PhD
Project Manager, Preregistration Challenge, Center for Open Science
|Deborah Mayo To David: 10:33 AM (1 hour ago)
David: Yes I’m familiar with it, and I hope that it encourages people to avoid data-dependent determinations that bias results. It shows the importance of statistical accounts that can pick up on such biasing selection effects. On the other hand, coupling prereg with some of the flexible inference accounts now in use won’t really help. Moreover, there may, in some fields, be a tendency to research a non-novel, fairly trivial result.
And if they’re going to preregister, why not go blind as well? Will they?
Mayo Continue reading
This headliner appeared two years ago, but to a sparse audience (likely because it was during winter break), so Management’s giving him another chance…
You might not have thought there could be new material for 2014, but there is, and if you look a bit more closely, you’ll see that it’s actually not Jay Leno  who is standing up there at the mike ….
It’s Sir Harold Jeffreys himself! And his (very famous) joke, I admit, is funny. So, since it’s Saturday night, let’s listen in on Sir Harold’s howler* in criticizing the use of p-values.
“Did you hear the one about significance testers rejecting H0 because of outcomes H0 didn’t predict?
‘What’s unusual about that?’ you ask?
What’s unusual, is that they do it when these unpredicted outcomes haven’t even occurred!”
[The actual quote from Jeffreys: Using p-values implies that “An hypothesis that may be true is rejected because it has failed to predict observable results that have not occurred. This seems a remarkable procedure.” (Jeffreys 1939, 316)]
I say it’s funny, so to see why I’ll strive to give it a generous interpretation. Continue reading