To take up the first criticism, we can consider J. Kadane’s new book, Principles of Uncertainty (2011, CRC Press*). Kadane, to his credit, does not beat around the bush as regards his subjective Bayesian perspective; his is a leading Bayesian voice in the tradition of Savage. He takes up central criticisms of frequentist methods in Chapter 12 called “Exploration of Old Ideas”. So now I am not only in foundational exile, I am clinging to ideas that are in need of Juvederm!

Here is his criticism: “Flip a biased coin that comes up heads with probability 0.95, and tails with probability 0.05. If the coin comes up tails reject the null hypothesis. Since the probability of rejecting the null hypothesis if it is true is 0.05, this is a valid 5% level test. It is also very robust against data errors; indeed it does not depend on the data at all. It is also nonsense, of course, but nonsense allowed by the rules of significance testing.” (439)

But is it allowed? I say no. The null hypothesis in Kadane’s argument can be in any field, perhaps it concerns mean transmission of Scrapie in mice. But as noted in the in the Kuru post, data are always anomalous in relation to a hypothesis H. Both in significance tests and in scientific hypotheses testing more generally, data indicate inconsistency with H only by being *counter to what would be expected under the assumption that H is correct. *Were someone to tell Prusiner the testing methods he follows actually allow any old “improbable” event (a stock split in Apple?) to reject a hypothesis about prion transmission rates, Prusiner would say that person didn’t understand the requirements of hypothesis testing in science. Since the criticism would hold no water in the analogous case of Prusiner’s test, it must equally miss its mark in the case of significance tests**. That, recall, was Rule #1.

Now the reader might just say that Kadane is simply making a little joke, but then why include it within a chapter purporting to give serious criticisms of significance testing (and other frequentist methods)? Don’t the familiar fallacies of significance testing already make it enough of a whipping boy? Following the philosopher’s rule of “generous interpretation”, I will assume the criticisms are to be taken seriously and to heart.

P.S. Mulina Palace is quite nice.

P.S.S. I understand the old comments can be excavated.

*For non-commercial purposes can download from http://uncertainty.stat.cmu.edu/.

** Statistical tests are even more explicit in setting out a “test statistic.”

To put it another way, the test is valid

but has useless power.

Larry

Larry. Kadane admits the “useless power”, but I claim it is invalid. For an error statistician’s P(x;H), the probability of x has to be calculable under (or because) x was due to (or generated by) a process as described in H. It is not like a conditional probability where the x need have nothing to do with H. That at any rate is my position, and I mentioned this silly example because I hoped to bring out that rarely noted point. A number of criticisms/misunderstandings revolve around this (including the business of searching). If this is not standard, then I need to emphasize that interpretation of P(x;H) in my book. Freedman used P(x||H).

I agree the sample size issue makes this example overly contrived. Have you seen John Kruschke’s (author of the popular “Doing Bayesian Data Analysis” book) critique? It’s a bit more compelling:

http://www.indiana.edu/~kruschke/articles/Kruschke2010WIRES.pdf

John Myles White has also been doing a series of blog entries on critiques NHST but in my opinion those critiques are more of a description of well-known properties which impede practical usefulness than foundational problems (e.g. loss of information, conflation of effect magnitude with sample size).

rv: had never heard of him, but sent him some links regarding the fallacious “argument from intentions” that he appears to be building his industry upon. thanks for the reference.