These days, so many theater productions are updated reviews of older standards. Same with the comedy hours at the Bayesian retreat, and task force meetings of significance test reformers. So (on the 1-year anniversary of this blog) let’s listen in to one of the earliest routines (with highest blog hits), but with some new reflections (first considered here and here).
‘ “Did you hear the one about the frequentist . . .
“who claimed that observing “heads” on a biased coin that lands heads with probability .05 is evidence of a statistically significant improvement over the standard treatment of diabetes, on the grounds that such an event occurs with low probability (.05)?”
The joke came from J. Kadane’s Principles of Uncertainty (2011, CRC Press*).
“Flip a biased coin that comes up heads with probability 0.95, and tails with probability 0.05. If the coin comes up tails reject the null hypothesis. Since the probability of rejecting the null hypothesis if it is true is 0.05, this is a valid 5% level test. It is also very robust against data errors; indeed it does not depend on the data at all. It is also nonsense, of course, but nonsense allowed by the rules of significance testing.” (439)
But is it allowed? I say no. The null hypothesis in the joke can be in any field, perhaps it concerns mean transmission of Scrapie in mice (as in my early Kuru post). I know some people view significance tests as merely rules that rarely reject erroneously, but I claim this is mistaken. Both in significance tests and in scientific hypothesis testing more generally, data indicate inconsistency with H only by being counter to what would be expected under the assumption that H is correct (as regards a given aspect observed). Were someone to tell Prusiner that the testing methods he follows actually allow any old “improbable” event (a stock split in Apple?) to reject a hypothesis about prion transmission rates, Prusiner would say that person didn’t understand the requirements of hypothesis testing in science. Since the criticism would hold no water in the analogous case of Prusiner’s test, it must equally miss its mark in the case of significance tests**. That, recall, was Rule #1. Continue reading
By: Nathan Schachtman, Esq., PC*
When the Supreme Court decided this case, I knew that some people would try to claim that it was a decision about the irrelevance or unimportance of statistical significance in assessing epidemiologic data. Indeed, the defense lawyers invited this interpretation by trying to connect materiality with causation. Having rejected that connection, the Supreme Court’s holding could address only materiality because causation was never at issue. It is a fundamental mistake to include undecided, immaterial facts as part of a court’s holding or the ratio decidendi of its opinion.
Interstitial Doubts About the Matrixx
Statistics professors are excited that the United States Supreme Court issued an opinion that ostensibly addressed statistical significance. One such example of the excitement is an article, in press, by Joseph B. Kadane, Professor in the Department of Statistics, in Carnegie Mellon University, Pittsburgh, Pennsylvania. See Joseph B. Kadane, “Matrixx v. Siracusano: what do courts mean by ‘statistical significance’?” 11[x] Law, Probability and Risk 1 (2011).
Professor Kadane makes the sensible point that the allegations of adverse events did not admit of an analysis that would imply statistical significance or its absence. Id. at 5. See Schachtman, “The Matrixx – A Comedy of Errors” (April 6, 2011)”; David Kaye, ” Trapped in the Matrixx: The U.S. Supreme Court and the Need for Statistical Significance,” BNA Product Safety and Liability Reporter 1007 (Sept. 12, 2011). Unfortunately, the excitement has obscured Professor Kadane’s interpretation of the Court’s holding, and has led him astray in assessing the importance of the case. Continue reading
Sitting in the airport . . . a temporary escape from Elba, which I’m becoming more and more loathe to leave. I fear that some might agree, rightly, that Kadane’s “trivial test” is no indictment of significance tests and yet for the WRONG reason. I don’t want to beat a dead horse, but perhaps a certain confusion is going to obstruct understanding later on. Let us abbreviate “tails” on a coin toss that lands tails 5% of the time, as “a rare coin toss outcome”. Some seem to reason: since a rare coin toss outcome is an event with probability .05 REGARDLESS of the truth or falsity of a hypothesis H, then the test is still a legitimate significance test with significance level .05; it is just a lousy one, with no discriminating ability. I claim it is no significance test at all, and that there is an important equivocation going on (in some letters I’ve received)—one which I hoped would be skirted by the analogy with ordinary hypothesis testing in science. Heading off this confusion was the key rationale for my discussion in the Kuru post. Finding no nucleic acid in prions is inconsistent, or virtually so, under the hypothesis H: all pathogens are transmitted with nucleic acid. The observed results are anomalous for the central dogma H BECAUSE they are counter to what H says we would expect. If you maintain that the “rare coin toss outcome” is anomalous for a statistical null hypothesis H, then you would also have to say they are anomalous for H: all pathogens have nucleic acid. But it is obvious this is false in the case of the scientific hypothesis. It must also be rejected in the case of the statistical hypothesis (Rule #1).
A legitimate statistical test hypothesis must tell us (i.e., let us compute) how improbably far different experimental outcomes are from what would be expected under H. It is correct to regard experimental results as anomalous for a hypothesis H only if, and only because, they run counter to what H tells us would occur in a universe where H is correct. A hypothesis on pathogen transmission, say, does not tell us the improbability of the rare coin toss outcome. Thus it is no significance test at all. As I wrote in the Kuru post: It is not that infectious protein events are “very improbable” in their own right (however one construes this); it is rather that these events are counter to, and forbidden under, the assumption of the hypothesis H.
To take up the first criticism, we can consider J. Kadane’s new book, Principles of Uncertainty (2011, CRC Press*). Kadane, to his credit, does not beat around the bush as regards his subjective Bayesian perspective; his is a leading Bayesian voice in the tradition of Savage. He takes up central criticisms of frequentist methods in Chapter 12 called “Exploration of Old Ideas”. So now I am not only in foundational exile, I am clinging to ideas that are in need of Juvederm! Continue reading