Wilson E. Schmidt Professor of Economics
Department of Economics, Virginia Tech
Recurring controversies about P values and conﬁdence intervals revisited*
Ecological Society of America (ESA) ECOLOGY
Forum—P Values and Model Selection (pp. 609-654)
Volume 95, Issue 3 (March 2014): pp. 645-651
The use, abuse, interpretations and reinterpretations of the notion of a P value has been a hot topic of controversy since the 1950s in statistics and several applied ﬁelds, including psychology, sociology, ecology, medicine, and economics.
The initial controversy between Fisher’s signiﬁcance testing and the Neyman and Pearson (N-P; 1933) hypothesis testing concerned the extent to which the pre-data Type I error probability α can address the arbitrariness and potential abuse of Fisher’s post-data threshold for the P value. Fisher adopted a falsiﬁcationist stance and viewed the P value as an indicator of disagreement (inconsistency, contradiction) between data x0 and the null hypothesis (H0). Indeed, Fisher (1925: 80) went as far as to claim that ‘‘The actual value of p…indicates the strength of evidence against the hypothesis.’’ Neyman’s behavioristic interpretation of the pre-data Type I and II error probabilities precluded any evidential interpretation for the accept/reject the null (H0) rules, insisting that accept (reject) H0 does not connote the truth (falsity) of H0. The last exchange between these protagonists (Fisher 1955, Pearson 1955, Neyman 1956) did nothing to shed light on these issues. By the early 1960s, it was clear that neither account of frequentist testing provided an adequate answer to the question (Mayo 1996): When do data x0 provide evidence for or against a hypothesis H?
The primary aim of this paper is to revisit several charges, interpretations, and comparisons of the P value with other procedures as they relate to their primary aims and objectives, the nature of the questions posed to the data, and the nature of their underlying reasoning and the ensuing inferences. The idea is to shed light on some of these issues using the error-statistical perspective; see Mayo and Spanos (2011).
Click to read all of A. Spanos on “Recurring controversies“.
SUMMARY AND CONCLUSIONS
The paper focused primarily on certain charges, claims, and interpretations of the P value as they relate to CIs and the AIC. It was argued that some of these comparisons and claims are misleading because they ignore key differences in the procedures being compared, such as (1) their primary aims and objectives, (2) the nature of the questions posed to the data, as well as (3) the nature of their underlying reasoning and the ensuing inferences.
In the case of the P value, the crucial issue is whether Fisher’s evidential interpretation of the P value as ‘‘indicating the strength of evidence against H0’’ is appropriate. It is argued that, despite Fisher’s maligning of the Type II error, a principled way to provide an adequate evidential account, in the form of post-data severity evaluation, calls for taking into account the power of the test.
The error-statistical perspective brings out a key weakness of the P value and addresses several foundational issues raised in frequentist testing, including the fallacies of acceptance and rejection as well as misinterpretations of observed CIs; see Mayo and Spanos (2011). The paper also uncovers the connection between model selection procedures and hypothesis testing, revealing the inherent unreliability of the former. Hence, the choice between different procedures should not be ‘‘stylistic’’ (Murtaugh 2013), but should depend on the questions of interest, the answers sought, and the reliability of the procedures.
*Spanos, A. (2014) Recurring controversies about P values and conﬁdence intervals revisited. Ecology 95(3): 645-651.