To take up the first criticism, we can consider J. Kadane’s new book, Principles of Uncertainty (2011, CRC Press*). Kadane, to his credit, does not beat around the bush as regards his subjective Bayesian perspective; his is a leading Bayesian voice in the tradition of Savage. He takes up central criticisms of frequentist methods in Chapter 12 called “Exploration of Old Ideas”. So now I am not only in foundational exile, I am clinging to ideas that are in need of Juvederm!
Here is his criticism: “Flip a biased coin that comes up heads with probability 0.95, and tails with probability 0.05. If the coin comes up tails reject the null hypothesis. Since the probability of rejecting the null hypothesis if it is true is 0.05, this is a valid 5% level test. It is also very robust against data errors; indeed it does not depend on the data at all. It is also nonsense, of course, but nonsense allowed by the rules of significance testing.” (439)
But is it allowed? I say no. The null hypothesis in Kadane’s argument can be in any field, perhaps it concerns mean transmission of Scrapie in mice. But as noted in the in the Kuru post, data are always anomalous in relation to a hypothesis H. Both in significance tests and in scientific hypotheses testing more generally, data indicate inconsistency with H only by being counter to what would be expected under the assumption that H is correct. Were someone to tell Prusiner the testing methods he follows actually allow any old “improbable” event (a stock split in Apple?) to reject a hypothesis about prion transmission rates, Prusiner would say that person didn’t understand the requirements of hypothesis testing in science. Since the criticism would hold no water in the analogous case of Prusiner’s test, it must equally miss its mark in the case of significance tests**. That, recall, was Rule #1.
Now the reader might just say that Kadane is simply making a little joke, but then why include it within a chapter purporting to give serious criticisms of significance testing (and other frequentist methods)? Don’t the familiar fallacies of significance testing already make it enough of a whipping boy? Following the philosopher’s rule of “generous interpretation”, I will assume the criticisms are to be taken seriously and to heart.
P.S. Mulina Palace is quite nice.
P.S.S. I understand the old comments can be excavated.
*For non-commercial purposes can download from http://www.stat.cmu.edu/~kadane/principles.pdf
** Statistical tests are even more explicit in setting out a “test statistic.”
To put it another way, the test is valid
but has useless power.