Have you noticed that some of the harshest criticisms of frequentist error-statistical methods these days rest on methods and grounds that the critics themselves purport to reject? Is there a whiff of inconsistency in proclaiming an “anti-hypothesis-testing stance” while in the same breath extolling the uses of statistical significance tests and p-values in mounting criticisms of significance tests and p-values? I was reminded of this in the last two posts (comments) on this blog (here and here) and one from Gelman from a few weeks ago (“Interrogating p-values”).
Gelman quotes from a note he is publishing:
“..there has been a growing sense that psychology, biomedicine, and other fields are being overwhelmed with errors … . In two recent series of papers, Gregory Francis and Uri Simonsohn and collaborators have demonstrated too-good-to-be-true patterns of p-values in published papers, indicating that these results should not be taken at face value.”
But this fraudbusting is based on finding statistically significant differences from null hypotheses (e.g., nulls asserting random assignments of treatments)! If we are to hold small p-values untrustworthy, we would be hard pressed to take them as legitimating these criticisms, especially those of a career-ending sort.
…in addition to the well-known difficulties of interpretation of p-values…,…and to the problem that, even when all comparisons have been openly reported and thus p-values are mathematically correct, the ‘statistical significance filter’ ensures that estimated effects will be in general larger than true effects, with this discrepancy being well over an order of magnitude in settings where the true effects are small… (Gelman 2013)
But surely anyone who believed this would be up in arms about using small p-values as evidence of statistical impropriety. Am I the only one wondering about this?*
CLARIFICATION (6/15/13): Corey’s comment today leads me to a clarification, lest anyone misunderstand my point. I am sure that Francis, Simonsohn and others would never be using p-values and associated methods in the service of criticism if they did not regard the tests as legitimate scientific tools. I wasn’t talking about them. I was alluding to critics of tests who point to their work as evidence the statistical tools are not legitimate. Now maybe Gelman only intends to say, what we know and agree with, that tests can be misused and misinterpreted. But in these comments, our exchanges, and elsewhere, it is clear he is saying something much stronger. In my view, the use of significance tests by debunkers should have been taken as strong support for the value of the tools, correctly used. In short, I thought it was a success story! and I was rather perplexed to see somewhat the reverse.
*This just in: If one wants to see a genuine
quack extremist** who was outed long ago***, see Ziliac’s article declaring the Higgs physicists are pseudoscientists for relying on significance levels!( in the Financial Post 6/12/13).
**I am not placing the critics referred to above under this umbrella in the least.
***For some reviews of Ziliac and McCloskey, see widgets on left. For their flawed testimony on the Matrixx case, please search this blog.