Dear Reader: I will be traveling a lot in the next few weeks, and may not get to post much; we’ll see. If I do not reply to comments, I’m not ignoring them—they’re a lot more fun than some of the things I must do now to complete my book, but need to resist, especially while traveling and giving seminars.* The rule we’ve followed is for comments to shut after 10 days, but we wanted to allow them still to appear. The blogpeople on Elba forward comments for 10 days, so beyond that it’s just haphazard if I notice them. It’s impossible otherwise to keep this blog up at all, and I would like to. Feel free to call any to my attention (use “can we talk” page or firstname.lastname@example.org). If there’s a burning issue, interested readers might wish to poke around (or scour) the multiple layers of goodies on the left hand side of this web page, wherein all manner of foundational/statistical controversies are considered from many years of working in this area. In a recent attempt by Aris Spanos and I to address the age-old criticisms from the perspective of the “error statistical philosophy,” we delineate 13 criticisms. I list them below.
If there are others out there with error statistical leanings wishing to come partially out of exile, feel free to jump in with responses if relevant comments arise (as is often done spontaneously). Anonymity is OK. If the published work reveals a gap in the general error statistical program–there are many– then try to invent a way to constructively fill it! Also, if there are articles you come across that you think would be of relevance to the readers of this blog, please send them to me or Jean Miller: email@example.com (with or without comments of your own).
We will soon have at least 2 new papers for the special RMM volume, and any comments can always be submitted for publication.
*Palindromes, of course, will continue—as will the ongoing contest. (See “palindrome page”).
Ø (#1) Error statistical tools forbid using any background knowledge.
Ø (#2) All statistically signiﬁcant results are treated the same.
Ø (#3) The p-value does not tell us how large a discrepancy is found.
Ø (#4) With large enough sample size even a trivially small discrepancy from the null can be detected.
Ø (#5) Whether there is a statistically signiﬁcant difference from the null depends on which is the null and which is the alternative.
Ø (#6) Statistically insigniﬁcant results are taken as evidence that the null hypothesis is true.
Ø (#7) Error probabilities are invariably misinterpreted as posterior probabilities.
Ø (#8) Error statistical tests are justiﬁed only in cases where there is a very long (if not inﬁnite) series of repetitions of the same experiment.
Ø (#9) Specifying statistical tests is too arbitrary.
Ø (#10) We should be doing conﬁdence interval estimation rather than signiﬁcance tests.
Ø (#11) Error statistical methods take into account the intentions of the scientists analyzing the data.
Ø (#12) All models are false anyway.
Ø (#13) Testing assumptions involves illicit data-mining.
HAVE WE LEFT ANY OUT?
Mayo & Spanos 2011 “Error Statistics“
“Error statistical methods depend on assumptions about the kind of data that might have occurred but didn’t.” (Includes #11 as a sub-case.)
This is typically illustrated with the example of repeated Bernoulli trials in which the stopping criterion changes the sampling distribution (but not the likelihood function). Since the stopping criterion can depend on the intentions of the scientist generating the data, or even on circumstances beyond her control and of which she has no knowledge, this is deemed kooky. It’s a common trope in Bayesian criticisms of p-values, so you’ve probably seen it often enough.
Yes, and I’ve written a lot, A LOT about it (perhaps more than anything else). Please see chapters 9 and 10 of EGEK (1996), posted here and/or Mayo and Kruse (principles of Inference and their consequences)—less force-feeding please. (It’s bad enough the state is forced (pretend*) dieting, not that I don’t diet of my own free will). But since you’ve asked, it’s not a matter of “intentions” to us, any more than a number of other aspects of data and hypotheses generation/selection that alter error probabilities!
*as with the ban of large cups if used for sweetened drinks in NYC.