In February, in London, criminologist Katrin H. and I went to see Jackie Mason do his shtick, a one-man show billed as his swan song to England. It was like a repertoire of his “Greatest Hits” without a new or updated joke in the mix. Still, hearing his rants for the nth time was often quite hilarious.
A sample: If you want to eat nothing, eat nouvelle cuisine. Do you know what it means? No food. The smaller the portion the more impressed people are, so long as the food’s got a fancy French name, haute cuisine. An empty plate with sauce!
As one critic wrote, Mason’s jokes “offer a window to a different era,” one whose caricatures and biases one can only hope we’ve moved beyond: http://www.guardian.co.uk/stage/2012/feb/21/jackie-mason-live-review
But it’s one thing for Jackie Mason to scowl at a seat in the front row and yell to the shocked audience member in his imagination, “These are jokes! They are just jokes!” and another to reprise statistical howlers, which are not jokes, to me. This blog found its reason for being partly as a place to expose, understand, and avoid them. Recall the September 26, 2011 post “Whipping Boys and Witch Hunters”: https://errorstatistics.com/2011/09/26/whipping-boys-and-witch-hunters-comments-are-now-open/: [i]
Fortunately, philosophers of statistics would surely not reprise decades-old howlers and fallacies. After all, it is the philosopher’s job to clarify and expose the conceptual and logical foibles of others; and even if we do not agree, we would never merely disregard and fail to address the criticisms in published work by other philosophers. Oh wait, ….one of the leading texts repeats the fallacy in their third edition:
“The classical thesis that a null hypothesis may be rejected with greater confidence, the greater the power of the test, is not borne out; indeed the reverse trend is signaled” (Howson and Urbach 2006, 154).
But this is mistaken. The frequentist appraisal of tests is, and has always been, the reverse, whether of Fisherian significance tests or those of the Neyman-Pearson variety. This is pointed this out directly in relation to their Bayesian text in EGEK 1996, pp. 402-3:
But alas, they repeat it verbatim, with no reference to these corrections. Given the popularity of their text, the consequences are not surprising (at least in some quarters): another generation committing the same fallacy and/or repeating the same howlers against significance tests. It is essentially the fallacy behind the imaginary case of the “prionvac” reformer who is (inadvertently we suppose) more impressed the smaller the discrepancy indicated—analogous to Mason’s haute cuisiner (Oct. 4, 2011). (See also Note [ii])
I am currently researching and writing a new book on contemporary philosophy of statistics. The review of the literature is itself a window on the movement of positions through philosophical scrutiny. With philosophy of frequentist statistics, however, I (often) find myself at the window of an older era, one I had hoped we’d left behind.
References
Howson, C. and P. Urbach (2006). Scientific Reasoning: The Bayesian Approach. La Salle, Il: Open Court
Mayo, D. G (1983) “An Objective Theory of Statistical Testing.” Synthese 57(2): 297-340.
Mayo, D. G (1996) Error and the Growth of Experimental Knowledge, [EGEK] Chicago: Chicago University Press.
Mayo, D. G. and A. Spanos (2006) “Severe Testing as a Basic Concept in a Neyman-Pearson Philosophy of Induction” British Journal of Philosophy of Science, 57: 323-357.
Mayo, D. G. and A. Spanos (2011) “Error Statistics” in Philosophy of Statistics , Handbook of Philosophy of Science Volume 7 Philosophy of Statistics, (Volume eds. Prasanta S. Bandyopadhyay and Malcolm R. Forster. General editors: Dov M. Gabbay, Paul Thagard and John Woods) Elsevier: 1-46.
Morrison, D. and R. Henkel (eds.) (1970). Significance Test Controversy. Chicago: Aldine
Rosenthal, R. and J. Gaito (1963). “The Interpretation of Levels of Significance by Psychological Researchers. Journal of Psychology 55:33-38.
[ii] “Now some early literature, e.g., Morrison and Henkel’s Significance Test Controversy(1962), performed an important service over fifty years ago. They alerted social scientists to the fallacies of significance tests: misidentifying a statistically significant difference with one of substantive importance, interpreting insignificant results as evidence for the null hypothesis—especially problematic with insensitive tests, and the like. Chastising social scientists for applying significance tests in slavish and unthinking ways, contributors call attention to a cluster of pitfalls and fallacies of testing.“The volume describes research studies conducted for the sole purpose of revealing these flaws. Rosenthal and Gaito (1963) document how it is not rare for scientists to mistakenly regard a statistically significant difference, say at level .05, as indicating a greater discrepancy from the null when arising from a large sample size rather than a smaller sample size—even though a correct interpretation of tests indicates the reverse.”
The blogpeople noticed a few paragraphs were left out of the post I initially sent them, so they’ve restored them.