Some recent criticisms of statistical tests of significance have breathed brand new life into some very old howlers, many of which have been discussed on this blog. One variant that returns to the scene every decade I think (for 50+ years?), takes a “disagreement on numbers” to show a problem with significance tests even from a “frequentist” perspective. Since it’s Saturday night, let’s listen in to one of the comedy hours from 3 years ago (0) (new notes in red):
Did you hear the one about the frequentist significance tester when he was shown the nonfrequentist nature of p-values?
JB [Jim Berger]: I just simulated a long series of tests on a pool of null hypotheses, and I found that among tests with p-values of .05, at least 22%—and typically over 50%—of the null hypotheses are true!(1)
Frequentist Significance Tester: Scratches head: But rejecting the null with a p-value of .05 ensures erroneous rejection no more than 5% of the time!
Raucous laughter ensues!
(Hah, hah,…. I feel I’m back in high school: “So funny, I forgot to laugh!)
The frequentist tester should retort:
Frequentist Significance Tester: But you assumed 50% of the null hypotheses are true, and computed P(H0|x) (imagining P(H0)= .5)—and then assumed my p-value should agree with the number you get, if it is not to be misleading!
Yet, our significance tester is not heard from as they move on to the next joke…. Continue reading