At least as apt today as 3 years ago…HAPPY HALLOWEEN! Memory Lane with new comments in blue.
In an earlier post I alleged that frequentist hypotheses tests often serve as whipping boys, by which I meant “scapegoats”, for the well-known misuses, abuses, and flagrant misinterpretations of tests (both simple Fisherian significance tests and Neyman-Pearson tests, although in different ways)—as well as for what really boils down to a field’s weaknesses in modeling, theorizing, experimentation, and data collection. Checking the history of this term however, there is a certain disanalogy with at least the original meaning of a “whipping boy,” namely, an innocent boy who was punished when a medieval prince misbehaved and was in need of discipline. It was thought that seeing an innocent companion, often a friend, beaten for his own transgressions would supply an effective way to ensure the prince would not repeat the same mistake. But significance tests floggings, rather than a tool for a humbled self-improvement and commitment to avoiding flagrant rule violations, has tended instead to yield declarations that it is the rules that are invalid! The violators are excused as not being able to help it! The situation is more akin to that of witch hunting that in some places became an occupation in its own right.
Now some early literature, e.g., Morrison and Henkel’s Significance Test Controversy (1962), performed an important service over fifty years ago. They alerted social scientists to the fallacies of significance tests: misidentifying a statistically significant difference with one of substantive importance, interpreting insignificant results as evidence for the null hypothesis—especially problematic with insensitive tests, and the like. Chastising social scientists for applying significance tests in slavish and unthinking ways, contributors call attention to a cluster of pitfalls and fallacies of testing. Continue reading →
- 3 years ago…
MONTHLY MEMORY LANE: 3 years ago: October 2012. I mark in red three posts that seem most apt for general background on key issues in this blog. Posts that are part of a “unit” or a group of “U-Phils” count as one, and there are two such groupings this month. The 10/18 “Query” gave rise to a large and useful discussion on de Finetti-style probability.
- (10/02)PhilStatLaw: Infections in the court
- (10/05) Metablog: Rejected posts (blog within a blog)
- (10/05) Deconstructing Gelman, Part 1: “A Bayesian wants everybody else to be a non-Bayesian.”
- (10/07) Deconstructing Gelman, Part 2: Using prior information
- (10/09) Last part (3) of the deconstruction: beauty and background knowledge
- (10/12) U-Phils: Hennig and Aktunc on Gelman 2012
- (10/13) Mayo Responds to U-Phils on Background Information
- (10/15) New Kvetch: race-based academics in Fla
- (10/17) RMM-8: New Mayo paper: “StatSci and PhilSci: part 2 (Shallow vs Deep Explorations)”
- (10/18) Query (Understanding de Finetti style probability)–large and useful discussion
 excluding those reblogged fairly recently. Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.
Is it possible, today, to have a fair-minded engagement with debates over statistical foundations? I’m not sure, but I know it is becoming of pressing importance to try. Increasingly, people are getting serious about methodological reforms—some are quite welcome, others are quite radical. Too rarely do the reformers bring out the philosophical presuppositions of the criticisms and proposed improvements. Today’s (radical?) reform movements are typically launched from criticisms of statistical significance tests and P-values, so I focus on them. Regular readers know how often the P-value (that most unpopular girl in the class) has made her appearance on this blog. Here, I tried to quickly jot down some queries. (Look for later installments and links.) What are some key questions we need to ask to tell what’s true about today’s criticisms of P-values?
I. To get at philosophical underpinnings, the single most import question is this:
(1) Do the debaters distinguish different views of the nature of statistical inference and the roles of probability in learning from data? Continue reading →
The Royal Statistical Society sent me a letter announcing their latest Journal webinar next Wednesday 21 October:
…RSS Journal webinar on 21st October featuring Bradley Efron, Andrew Gelman and Peter Diggle. They will be in discussion about Bradley Efron’s recently published paper titled ‘Frequentist accuracy of Bayesian estimates’. The paper was published in June in the Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol 77 (3), 617-646. It is free to access from October 7th to November 4th.
Webinar start time: 8 am in California (PDT); 11 am in New York (EDT); 4pm (UK time).
During the webinar, Bradley Efron will present his paper for about 30 minutes followed by a Q&A session with the audience. Andrew Gelman is joining us as discussant and the event will be chaired by our President, Peter Diggle. Participation in the Q&A session by anyone who dials in is warmly welcomed and actively encouraged.Participants can ask the author a question over the phone or simply issue a message using the web based teleconference system. Questions can be emailed in advance and further information can be requested from firstname.lastname@example.org.
More details about this journal webinar and how to join can be found in StatsLife and on the RSS website. RSS Journal webinars are sponsored by Quintiles.
We’d be delighted if you were able to join us on the 21st and very grateful if you could let your colleagues and students know about the event.
I will definitely be tuning in!
Given the excited whispers about the upcoming meeting of the American Statistical Association Committee on P-Values and Statistical Significance, it’s an apt time to reblog my post on the “Don’t Ask Don’t Tell” policy that began the latest brouhaha!
A large number of people have sent me articles on the “test ban” of statistical hypotheses tests and confidence intervals at a journal called Basic and Applied Social Psychology (BASP)[i]. Enough. One person suggested that since it came so close to my recent satirical Task force post, that I either had advance knowledge or some kind of ESP. Oh please, no ESP required.None of this is the slightest bit surprising, and I’ve seen it before; I simply didn’t find it worth blogging about (but Saturday night is a perfect time to read/reread the (satirical) Task force post [ia]). Statistical tests are being banned, say the editors, because they purport to give probabilities of null hypotheses (really?) and do not, hence they are “invalid”.[ii] (Confidence intervals are thrown in the waste bin as well—also claimed “invalid”).“The state of the art remains uncertain” regarding inferential statistical procedures, say the editors. I don’t know, maybe some good will come of all this.
Yet there’s a part of their proposal that brings up some interesting logical puzzles, and logical puzzles are my thing. In fact, I think there is a mistake the editors should remedy, lest authors be led into disingenuous stances, and strange tangles ensue. I refer to their rule that authors be allowed to submit papers whose conclusions are based on allegedly invalid methods so long as, once accepted, they remove any vestiges of them! Continue reading →
Scientist sees squirrel
Evolutionary ecologist, Stephen Heard (Scientist Sees Squirrel) linked to my blog yesterday. Heard’s post asks: “Why do we make statistics so hard for our students?” I recently blogged Barnard who declared “We need more complexity” in statistical education. I agree with both: after all, Barnard also called for stressing the overarching reasoning for given methods, and that’s in sync with Heard. Here are some excerpts from Heard’s (Oct 6, 2015) post. I follow with some remarks.
This bothers me, because we can’t do inference in science without statistics*. Why are students so unreceptive to something so important? In unguarded moments, I’ve blamed it on the students themselves for having decided, a priori and in a self-fulfilling prophecy, that statistics is math, and they can’t do math. I’ve blamed it on high-school math teachers for making math dull. I’ve blamed it on high-school guidance counselors for telling students that if they don’t like math, they should become biology majors. I’ve blamed it on parents for allowing their kids to dislike math. I’ve even blamed it on the boogie**. Continue reading →
Junk Science (as first coined).* Have you ever noticed in wranglings over evidence-based policy that it’s always one side that’s politicizing the evidence—the side whose policy one doesn’t like? The evidence on the near side, or your side, however, is solid science. Let’s call those who first coined the term “junk science” Group 1. For Group 1, junk science is bad science that is used to defend pro-regulatory stances, whereas sound science would identify errors in reports of potential risk. (Yes, this was the first popular use of “junk science”, to my knowledge.) For the challengers—let’s call them Group 2—junk science is bad science that is used to defend the anti-regulatory stance, whereas sound science would identify potential risks, advocate precautionary stances, and recognize errors where risk is denied.
Both groups agree that politicizing science is very, very bad—but it’s only the other group that does it!
A given print exposé exploring the distortions of fact on one side or the other routinely showers wild praise on their side’s—their science’s and their policy’s—objectivity, their adherence to the facts, just the facts. How impressed might we be with the text or the group that admitted to its own biases? Continue reading →