Prionvac: Our experiments yield a statistically significant increase in survival among scrapie-infected mice who are given our new vaccine (p = .01) compared to infected mice who are treated with a placebo. The data indicate H: an increased survival time of 9 months, compared to untreated mice.* Continue reading
Monthly Archives: September 2011
In an earlier post I alleged that frequentist hypotheses tests often serve as whipping boys, by which I meant “scapegoats”, for the well-known misuses, abuses, and flagrant misinterpretations of tests (both simple Fisherian significance tests and Neyman-Pearson tests, although in different ways). Checking the history of this term however, there is a certain disanalogy with at least the original meaning of a of “whipping boy,” namely, an innocent boy who was punished when a medieval prince misbehaved and was in need of discipline. It was thought that seeing an innocent companion, often a friend, beaten for his own transgressions would supply an effective way to ensure the prince would not repeat the same mistake. But significance tests floggings, rather than a tool for a humbled self-improvement and commitment to avoiding flagrant rule violations, has tended instead to yield declarations that it is the rules that are invalid! The violators are excused as not being able to help it! The situation is more akin to that of witch hunting, that in some places became an occupation in its own right. Continue reading
Given some slight recuperation delays, interested readers might wish to poke around the multiple layers of goodies on the left hand side of this web page, wherein all manner of foundational/statistical controversies are considered. In a recent attempt by Aris Spanos and I to address the age-old criticisms from the perspective of the “error statistical philosophy,” we delineate 13 criticisms. Here they are:
Ø (#1) error statistical tools forbid using any background knowledge.
Ø (#2) All statistically signiﬁcant results are treated the same.
Ø (#3) The p-value does not tell us how large a discrepancy is found.
Ø (#4) With large enough sample size even a trivially small discrepancy from the null can be detected.
Ø (#5) Whether there is a statistically signiﬁcant diﬀerence from the null depends on which is the null and which is the alternative.
Ø (#6) Statistically insigniﬁcant results are taken as evidence that the null hypothesis is true.
Ø (#7) Error probabilities are invariably misinterpreted as posterior probabilities.
Ø (#8) Error statistical tests are justiﬁed only in cases where there is a very long (if not inﬁnite) series of repetitions of the same experiment.
Ø (#9) Specifying statistical tests is too arbitrary.
Ø (#10) We should be doing conﬁdence interval estimation rather than signiﬁcance tests.
Ø (#11) Error statistical methods take into account the intentions of the scientists analyzing the data.
Ø (#12) All models are false anyway.
Ø (#13) Testing assumptions involves illicit data-mining.
HAVE WE LEFT ANY OUT?
Mayo & Spanos “Error Statistics” 2011
(for problems accessing links, please write to: firstname.lastname@example.org)
The journey to San Francisco was smooth sailing with no plane delays; within two hours of landing I found myself in the E.R. of St. Francis Hospital (with the philosopher of science Ronald Giere), unable to walk. I have just described an unexpected, “anomalous”, highly unusual event, but no one would suppose it was anomalous FOR, i.e., evidence against some theory, say, in molecular biology. Yet I am getting e-mails (from readers) saying, in effect, that since the improbable coin toss result is very unexpected/anomalous in its own right, it therefore is anomalous for any and all theories, which is patently absurd. What had happened, in case you want to know, is that just as I lunged forward to grab my (bulging) suitcase off the airline baggage thingy, out of the corner of my eye I saw my computer bag being pulled away by someone on my left, and as I simultaneously yanked it back, I tumbled over—very gently it seemed– twisting my knee in a funny way. To my surprise/alarm, much as a tried, I could put no weight on my right leg without succumbing to a Geppeto-puppet-like collapse. The event, of course, could rightly be regarded as anomalous for hypotheses about my invulnerability to such mishaps, because it runs counter to them. I will assume this issue is now settled for our discussions, yes?
Sitting in the airport . . . a temporary escape from Elba, which I’m becoming more and more loathe to leave. I fear that some might agree, rightly, that Kadane’s “trivial test” is no indictment of significance tests and yet for the WRONG reason. I don’t want to beat a dead horse, but perhaps a certain confusion is going to obstruct understanding later on. Let us abbreviate “tails” on a coin toss that lands tails 5% of the time, as “a rare coin toss outcome”. Some seem to reason: since a rare coin toss outcome is an event with probability .05 REGARDLESS of the truth or falsity of a hypothesis H, then the test is still a legitimate significance test with significance level .05; it is just a lousy one, with no discriminating ability. I claim it is no significance test at all, and that there is an important equivocation going on (in some letters I’ve received)—one which I hoped would be skirted by the analogy with ordinary hypothesis testing in science. Heading off this confusion was the key rationale for my discussion in the Kuru post. Finding no nucleic acid in prions is inconsistent, or virtually so, under the hypothesis H: all pathogens are transmitted with nucleic acid. The observed results are anomalous for the central dogma H BECAUSE they are counter to what H says we would expect. If you maintain that the “rare coin toss outcome” is anomalous for a statistical null hypothesis H, then you would also have to say they are anomalous for H: all pathogens have nucleic acid. But it is obvious this is false in the case of the scientific hypothesis. It must also be rejected in the case of the statistical hypothesis (Rule #1).
A legitimate statistical test hypothesis must tell us (i.e., let us compute) how improbably far different experimental outcomes are from what would be expected under H. It is correct to regard experimental results as anomalous for a hypothesis H only if, and only because, they run counter to what H tells us would occur in a universe where H is correct. A hypothesis on pathogen transmission, say, does not tell us the improbability of the rare coin toss outcome. Thus it is no significance test at all. As I wrote in the Kuru post: It is not that infectious protein events are “very improbable” in their own right (however one construes this); it is rather that these events are counter to, and forbidden under, the assumption of the hypothesis H.
I’m jumping off the Island for a bit. Destination: San Francisco, a conference on “The Experimental Side of Modeling” http://www.isabellepeschard.org/ . Kuru makes a walk on appearance in my presentation, “How Experiment Gets a Life of its Own”. It does not directly discuss statistics, but I will post my slides.
The last time I was in SF was in 2003 with my econometrician colleague, Aris Spanos. We were on our way to Santa Barbara to engage in an unusual powwow on statistical foundations at NCEAS*, and stopped off in SF to meet with Erich Lehmann and his wife, Julie Shaffer. We discussed, among other things, this zany idea of mine to put together a session for the Second Lehmann conference in 2004 that would focus on philosophical foundations of statistics. (Our session turned out to include David Freedman and D.R. Cox). Continue reading
To take up the first criticism, we can consider J. Kadane’s new book, Principles of Uncertainty (2011, CRC Press*). Kadane, to his credit, does not beat around the bush as regards his subjective Bayesian perspective; his is a leading Bayesian voice in the tradition of Savage. He takes up central criticisms of frequentist methods in Chapter 12 called “Exploration of Old Ideas”. So now I am not only in foundational exile, I am clinging to ideas that are in need of Juvederm! Continue reading
I have been reading about a disorder that intrigues me, Kuru (which means “shaking”) widespread among the Fore people of New Guinea in the 1960s. In around 3-6 months, Kuru victims go from having difficulty walking, to outbursts of laughter, to inability to swallow and death. Kuru, and (what we now know to be) related diseases, e.g., Mad Cow, Crutzfield Jacobs, scrapie) are “spongiform” diseases, causing brains to appear spongy. (They are also called TSEs: transmissible spongiform encephalopathies). Kuru clustered in families, in particular among Fore women and their children, or elderly parents. Continue reading
A simple rule before getting started: In presenting their arguments, philosophers sometimes appear to go off into far distant islands entirely, and then act as if they have shown something about the case at hand. The mystery evaporates if one keeps in mind the following rule of argument:
- If one argument is precisely analogous to another, in all relevant respects, and the second argument is pretty clearly fishy, then so is the first. Likewise, if one argument is precisely analogous to another, in all relevant respects, and the second argument passes swimmingly, then so must the first.
If the argument at hand is murky, while the one in the distant land crystal clear, then appealing to the latter is a powerful way to make a point. Because the relevance for the case at hand seems obvious, details may be left unstated. Of course you may avoid these conclusions by showing just where the analogies break down.
*Full disclosure: I own a fair amount of Diamond Offshore (DO), but do not plan to purchase more in the next 72 hours.
“Did you hear the one about the frequentist . . .
- “who claimed that observing “heads” on a biased coin that lands heads with probability .05 is evidence of a statistically significant improvement over the standard treatment of diabetes, on the grounds that such an event occurs with low probability (.05)?”
- “who defended the reliability of his radiation reading, despite using a broken radiometer, on the grounds that most of the time he uses one that works, so on average he’s pretty reliable?”
Such jests may work for an after-dinner laugh, but if it turns out that, despite being retreads of “straw-men” fallacies, they form the basis of why some reject frequentist methods, then they are not such a laughing matter. But surely the drubbing of frequentist methods could not be based on a collection of howlers, could it? I invite the curious reader to stay and find out. Continue reading
Confronted with the position that “arguments for this personalistic theory were so persuasive that anything to any extent inconsistent with that theory should be discarded” (Cox 2006, 196), frequentists might have seen themselves in a kind of exile when it came to foundations, even those who had been active in the dialogues of an earlier period. Sometime around the late 1990s there were signs that this was changing. Regardless of the explanation, the fact that it did occur and is occurring is of central importance to statistical philosophy.
Now that Bayesians have stepped off their a priori pedestal, it may be hoped that a genuinely deep scrutiny of the frequentist and Bayesian accounts will occur. In some corners of practice it appears that frequentist error statistical foundations are being discovered anew. Perhaps frequentist foundations, never made fully explicit, but at most lying deep below the ocean floor, are finally being disinterred. But let’s learn from some of the mistakes in the earlier attempts to understand it. With this goal I invite you to join me in some deep water drilling, here as I cast about on my Isle of Elba.
Cox, D. R. (2006), Principles of Statistical Inference, CUP.