Dear Reader: I will be traveling a lot in the next few weeks, and may not get to post much; we’ll see. If I do not reply to comments, I’m not ignoring them—they’re a lot more fun than some of the things I must do now to complete my book, but need to resist, especially while traveling and giving seminars.* The rule we’ve followed is for comments to shut after 10 days, but we wanted to allow them still to appear. The blogpeople on Elba forward comments for 10 days, so beyond that it’s just haphazard if I notice them. It’s impossible otherwise to keep this blog up at all, and I would like to. Feel free to call any to my attention (use “can we talk” page or error@vt.edu). If there’s a burning issue, interested readers might wish to poke around (or scour) the multiple layers of goodies on the left hand side of this web page, wherein all manner of foundational/statistical controversies are considered from many years of working in this area. In a recent attempt by Aris Spanos and I to address the age-old criticisms from the perspective of the “error statistical philosophy,” we delineate 13 criticisms. I list them below.

If there are others out there with error statistical leanings wishing to come partially out of exile, feel free to jump in with responses if relevant comments arise (as is often done spontaneously). Anonymity is OK. If the published work reveals a gap in the general error statistical program–there are many– then try to invent a way to constructively fill it! Also, if there are articles you come across that you think would be of relevance to the readers of this blog, please send them to me or Jean Miller: Phildgs2@gmail.com (with or without comments of your own).

We will soon have at least 2 new papers for the special RMM volume, and any comments can always be submitted for publication.

*Palindromes, of course, will continue—as will the ongoing contest. (See “palindrome page”).

——————–

The 13:

Ø (#1) Error statistical tools forbid using any background knowledge.

Ø (#2) All statistically signiﬁcant results are treated the same.

Ø (#3) The p-value does not tell us how large a discrepancy is found.

Ø (#4) With large enough sample size even a trivially small discrepancy from the null can be detected.

Ø (#5) Whether there is a statistically signiﬁcant difference from the null depends on which is the null and which is the alternative.

Ø (#6) Statistically insigniﬁcant results are taken as evidence that the null hypothesis is true.

Ø (#7) Error probabilities are invariably misinterpreted as posterior probabilities.

Ø (#8) Error statistical tests are justiﬁed only in cases where there is a very long (if not inﬁnite) series of repetitions of the same experiment.

Ø (#9) Specifying statistical tests is too arbitrary.

Ø (#10) We should be doing conﬁdence interval estimation rather than signiﬁcance tests.

Ø (#11) Error statistical methods take into account the intentions of the scientists analyzing the data.

Ø (#12) All models are false anyway.

Ø (#13) Testing assumptions involves illicit data-mining.

HAVE WE LEFT ANY OUT?

http://www.phil.vt.edu/dmayo/personal_website/Error%20%20Statistics%20Mayo%20&%20Spanos.pdf

“Error statistical methods depend on assumptions about the kind of data that might have occurred but didn’t.” (Includes #11 as a sub-case.)

This is typically illustrated with the example of repeated Bernoulli trials in which the stopping criterion changes the sampling distribution (but not the likelihood function). Since the stopping criterion can depend on the intentions of the scientist generating the data, or even on circumstances beyond her control and of which she has no knowledge, this is deemed kooky. It’s a common trope in Bayesian criticisms of p-values, so you’ve probably seen it often enough.

Yes, and I’ve written a lot, A LOT about it (perhaps more than anything else). Please see chapters 9 and 10 of EGEK (1996), posted here and/or Mayo and Kruse (principles of Inference and their consequences)—less force-feeding please. (It’s bad enough the state is forced (pretend*) dieting, not that I don’t diet of my own free will). But since you’ve asked, it’s not a matter of “intentions” to us, any more than a number of other aspects of data and hypotheses generation/selection that alter error probabilities!

*as with the ban of large cups if used for sweetened drinks in NYC.

Er, I didn’t ask — you did: “HAVE WE LEFT ANY OUT?”

I’m just into chapter 3 of EGEK — I slogged through ch. 2 for a while before I decided I wanted to skip your take on Kuhn’s take on Popper and get to the good stuff about how Bayesians go wrong. I look forward to your discussion of this example when I get to ch. 9.

But surely it couldn’t be considered a gap in the program*, when I’ve discussed it all over the place. Anyway, I’m on a plane and am about to take off! Elbians are in charge, for now.

*oh, you think it should be it’s own category in the paper…it’s #11

I think it should replace #11, since the “circumstances beyond experimenter’s control or knowledge” thing makes it more general.

Corey: You don’t need to read EGEK consecutively, and if your central interest is philosophy of statistics, as opposed to general philosophy of experiment/induction/knowledge, then you might want to flip to just those chapters that deal with them. There’s one other thing: EGEK’s discussion of Bayesianism is directed mostly to subjective Bayesianism in philosophy, and strives to be as non-technical as possible (directing itself to a philosophical audience). So you might find more from the stat perspective in published papers rather than EGEK.

Another response to “have we left any out” is: “Frequentist probabilities do not exist” (and therefore testing model assumptions, whether involving data-dredging or not, cannot make sure enough that the model assumptions hold or even hold approximately).

Hi,

in your writings with Prof. Spanos or Prof. Spanos alone, it is assumed one model and a unique test.

Could you comment on, wrt to the error statistics philosophy:

1) adjusted p-value for multiple tests [see recent award to

and how does your SEV test work in this case?

2) the recent trend about Frequentist Model Averaging [www.jstor.org/stable/30045339]

I don’t remember reading an answer to Prof Senn’s question about your SEV (post-data) test and nuisance parameters.

4) Another Statistician that deserves to me to be more known is David A. Freedman from Berkeley (who passed away in 2008). He was also a true thinker and has always question the validity of stat modeling in social science (see: http://www.stat.berkeley.edu/~freedman/ and his books on statistics. )

D.A. Freedman. “Some issues in the foundation of statistics.” Foundations of Science, vol. 1 (1995) pp.19–83. Reprinted in Some Issues in the Foundation of Statistics, Kluwer, Dordrecht (1997). Bas C. van Fraassen, ed.

BTW, I took advantage of this comment to congratulate you and Prof. Spanos for the demistification of frequentist statistics that is portrayed in some many (the vast majority) books of statistics! I wish also to thank Prof. Spanos for his book on Statistics (1999), the only one I know where the reader is faced to someone who thinks and not tries to sell you his arguments (the worse in this vain are most of Bayesian stat books)

Regards,

STJ

Anon: See the same refs I gave to Corey above, and Cox and Mayo (2005, 2010) on “selection effects” and nuisance parameters. I knew Freedman of course, and did a session with him and Cox (on philstat) at a Lehmann Conference in 2004.

I forgot to put:

the name for the award on multiple testing is Yoav Benjamini