This post picks up, and continues, an exchange that began with comments on my June 14 blogpost (between Sander Greenland, Nicole Jinn, and I). My new response is at the end. The concern is how to expose and ideally avoid some of the well known flaws and foibles in statistical inference, thanks to gaps between data and statistical inference, and between statistical inference and substantive claims. I am not rejecting the use of multiple methods in the least (they are highly valuable when one method is capable of detecting or reducing flaws in one or more others). Nor am I speaking of classical dualism in metaphysics (which I also do not espouse). I begin with Greenland’s introduction of this idea in his comment… (For various earlier comments, see the post.)
…. I sense some confusion of criticism of the value of tests as popular tools vs. criticism of their logical foundation. I am a critic in the first, practical category, who regards the adoption of testing outside of narrow experimental programs as an unmitigated disaster, resulting in publication bias, prosecutor-type fallacies, and affirming the consequent fallacies throughout the health and social science literature. Even though testing can in theory be used soundly, it just hasn’t done well in practice in these fields. This could be ascribed to human failings rather than failings of received testing theories, but I would require any theory of applied statistics to deal with human limitations, just as safety engineering must do for physical products. I regard statistics as having been woefully negligent of cognitive psychology in this regard. In particular, widespread adoption and vigorous defense of a statistical method or philosophy is no more evidence of its scientific value than widespread adoption and vigorous defense of a religion is evidence of its scientific value. That should bring us to alternatives. I am aware of no compelling data showing that other approaches would have done better, but I do find compelling the arguments that at least some of the problems would have been mitigated by teaching a dualist approach to statistics, in which every procedure must be supplied with both an accurate frequentist and an accurate Bayesian interpretation, if only to reduce prevalent idiocies like interpreting a two-sided P-value as “the” posterior probability of a point null hypothesis.
Nicole Jinn (to Sander Greenland)
What exactly is this ‘dualist’ approach to teaching statistics and why does it mitigate the problems, as you claim? (I am increasingly interested in finding more effective ways to teach/instruct others in various age groups about statistics.) I have a difficult time seeing how effective this ‘dualist’ way of teaching could be for the following reason: the Bayesian and frequentist approaches are vastly different in their aims and the way they see statistics being used in (natural or social) science, especially when one looks more carefully at the foundations of each methodology (e.g., disagreements about where exactly probability enters into inference, or about what counts as relevant information). Hence, it does not make sense (to me) to supply both types of interpretation to the same data and the same research question! Instead, it makes more sense (from a teaching perspective) to demonstrate a Bayesian interpretation for one experiment, and a frequentist interpretation for another experiment, in the hopes of getting at the (major) differences between the two methodologies.
Sander. Thanks for your comment. Interestingly, I think the conglomeration of error statistical tools are the ones most apt at dealing with human limitations and foibles: they give piecemeal methods to ask one question at a time (e.g., would we be mistaken to suppose there is evidence of any effect at all? mistaken about how large? about iid assumptions? about possible causes? about implications for distinguishing any theories?). The standard Bayesian apparatus requires setting out a complete set of hypotheses that might arise, plus prior probabilities in each of them (or in “catchall” hypotheses), as well as priors in the model…and after this herculean task is complete, there is a purely deductive update: being deductive it never goes beyond the givens. Perhaps the data will require a change in your prior—this is what you must have believed before, since otherwise you find your posterior unacceptable—thereby encouraging the very self-sealing inferences we all claim to deplore. Continue reading








