This morning I received a paper I have been asked to review (anonymously as is typical). It is to head up a forthcoming issue of a new journal called Philosophy of Statistics: Retraction Watch. This is the first I’ve heard of the journal, and I plan to recommend they publish the piece, conditional on revisions. I thought I would post the abstract here. It’s that interesting.
“Some Slightly More Realistic Self-Criticism in Recent Work in Philosophy of Statistics,” Philosophy of Statistics: Retraction Watch, Vol. 1, No. 1 (2012), pp. 1-19.
In this paper we delineate some serious blunders that we and others have made in published work on frequentist statistical methods. First, although we have claimed repeatedly that a core thesis of the frequentist testing approach is that a hypothesis may be rejected with increasing confidence as the power of the test increases, we now see that this is completely backwards, and we regret that we have never addressed, or even fully read, the corrections found in Deborah Mayo’s work since at least 1983, and likely even before that.
Second, we have been wrong to claim that Neyman-Pearson (N-P) confidence intervals are inconsistent because in special cases it is possible for a specific 95% confidence interval to be known to be correct. Not only are the examples required to show this absurdly artificial, but the frequentist could simply interpret this “vacuous interval” “as a statement that all parameter values are consistent with the data at a particular level,” which, as Cox and Hinkley note, is an informative statement about the limitations in the data (Cox and Hinkley 1974, 226).
Third, we have been mistaken in maintaining that optional stopping cannot lead to confidence intervals that are assured to be misleading, even after Berger and Wolpert (1988, 80-81) themselves admitted this. We have been too ready to suppose that Savage had assured that this could not happen, when in fact Savage displays a shifty sleight of hand, showing this assurance holds only in a point-against-point hypothesis test, not in the two-sided test, or the corresponding confidence-interval procedure. Once again, Mayo has forcefully made the point in Mayo 1996, Mayo and Kruse (2001), and probably other places as well.
Fourth, we regret that we have continually presented frequentist methods without regard for the associated model or test statistic, which has allowed us to present the methods as a hodgepodge allowing incredibly silly results. This does not excuse our error, but the fact remains that our papers were received positively, and we were only following criticisms of frequentist methods in texts we thought were reliable. Had we actually studied the methods of Neyman and Pearson, as well as Fisher, not only would we have recognized the model’s centrality in this approach, but we would never have reversed the chronology, placing Fisherian tests after N-P tests.
Finally, we have recently conceded the flaw in Birnbaum’s argument alleging that the frequentist is forced to accept the strong likelihood principle (and thereby renounce error probabilities) so long as she adheres to sufficiency and weak conditionality. The blunder is a fairly glaring logical one. In our partial defence, aside from quibbles, no one aside from Mayo (2010) has fully discovered it, and the fallacious “proof” is widely published. If we had not been so caught up in Savage’s excitement about a breakthrough that would serve as a catalyst for adopting Bayesianism, we surely would have spotted it sooner. We have made other blunders, and we will undertake to expose them in subsequent issues of this journal.
Berger J. O. and R. L. Wolpert (1988). The Liklihood Prncple. 2nded. Hayward, Calif: Institute of Mathematical Statistics.
Birnbaum, A. (1962), “On the Foundations of Statistical Inference” (with discussion), Journal of the American Statistical Association, 57: 269–326’.
Birnbaum,A. (1970),“More onConcepts of Statistical Evidence,” Journal of the American Statistical Association, 67: 858–61.
Cox, D. and Hinkley. (1974). Theoretical Statistics. London: Chapman and Hall.
Mayo, D. (1983). “An Objective Theory of Statistical Testing.” Synthese 57(2): 297-340.
Mayo, D. (1996). Error and the Growth of Experimental Knowledge. Chicago: University of Chicago Press.
Mayo, D. (2010). “An Error in the Argument from Conditionality and Sufficiency to the Likelihood Principle” in Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science (D Mayo and A. Spanos eds.), Cambridge: Cambridge University Press: 305-14.
Mayo, D. and M. Kruse (2001). “Principles of Inference and Their Consequences,” in D. Cornfield and J. Williamson (eds.) Foundations of Bayesianism. Dordrecht: Kluwer Academic Publishes: 381-403.
Savage, L. (ed.) (1962). The Foundations of Statistical Inference: A Discussion. London: Methuen.