# Oxford Gaol: Statistical Bogeymen

Memory Lane: Oxford Jail (also called Oxford Castle) is an entirely fitting place to be on (and around) Halloween! Moreover, rooting around this rather lavish set of jail cells (what used to be a single cell is now a dressing room) is every bit as conducive to philosophical reflection as is exile on Elba! (I’m serious, it is now a boutique hotel.)  My goal (while in this gaol—as the English sometimes spell it) is to try and free us from the bogeymen and bogeywomen often associated with “classical” statistics. As a start, the very term “classical statistics” should I think be shelved, not that names should matter.

In appraising statistical accounts at the foundational level, we need to realize the extent to which accounts are viewed through the eyeholes of a mask or philosophical theory.  Moreover, the mask some wear while pursuing this task might well be at odds with their ordinary way of looking at evidence, inference, and learning. In any event, to avoid non-question-begging criticisms, the standpoint from which the appraisal is launched must itself be independently defended.   But for (most) Bayesian critics of error statistics the assumption that uncertain inference demands a posterior probability for claims inferred is thought to be so obvious as not to require support. Critics are implicitly making assumptions that are at odds with the frequentist statistical philosophy. In particular, they assume a certain philosophy about statistical inference (probabilism), often coupled with the allegation that error statistical methods can only achieve radical behavioristic goals, wherein all that matters are long-run error rates (of some sort)

• Error probabilities do not supply posterior probabilities in hypotheses, interpreted as if they do (and some say we just can’t help it), they lead to inconsistencies
• Methods with good long-run error rates can give rise to counterintuitive inferences in particular cases.
• I have proposed an alternative philosophy that replaces these tenets with different ones:
• the role of probability in inference is to quantify how reliably or severely claims (or discrepancies from claims) have been tested
• the severity goal directs us to the relevant error probabilities, avoiding the oft-repeated statistical fallacies due to tests that are overly sensitive, as well as those insufficiently sensitive to particular errors.
• Control of long run error probabilities, while necessary is not sufficient for good tests or warranted inferences.

What is key on the statistics side of this alternative philosophy is that the probabilities refer to the distribution of a statistic d(x)—the so-called sampling distribution.  Hence such accounts are often called sampling theory accounts. Since the sampling distribution is the basis for error probabilities, another term might be error statistical.

The very use of the sampling distribution to make inferences from data is at odds with Bayesian methods where consideration of outcomes other than the one observed is disallowed (likelihood principle).

“Neyman-Pearson hypothesis testing violates the likelihood principle, because the event either happens or does not; and hence has probability one or zero.” Kadane, 2011 Principles of Uncertainty, CRC press–For non-commerical purposes can download from http://uncertainty.stat.cmu.edu/

The idea of considering, hypothetically, what other outcomes could have occurred in reasoning from the one that did occur seems so obvious in ordinary reasoning that it will strike many as bizarre that an account of statistical inference would wish to banish such considerations.  And yet, banish them the Bayesian must[i]—at least if she is being coherent.  It may be surprising to discover that the Bayesian mask, if you wear it consistently, only has eyeholes for likelihoods (once the data are in front of you). (See earlier posts on the likelihood principle, and my contribution to the special RMM volume.)

What is key on the (philosophical side is that error probabilities may be used to quantify probativeness or severity of tests (for a given inference).

The twin goals of probative tests and informative inferences constrain the selection of tests.  I am prepared to grant that an overarching philosophy of science and statistics is needed to guide the use and construal of tests (whether of the N-P or Fisherian varieties), and to allow that formal methodology does not automatically give us methods that are adequate for controlling and assessing well-testedness .  (Otherwise, it would be very hard to explain how so many clever people raise those same criticisms and misinterpretations of tests!)

In this philosophy of science, inquirers find things out piece-meal.  Perhaps if scientists had to bet on a theory they could, but that is precisely the difference between such conjecturing (e.g., “I’ll bet GTR will break down somewhere!”) and what must be done to learn from evidence scientifically.  Rather than try to list all possible rivals to a hypothesis of interest, plus degrees of probability to each (however one likes to interpret these), progress is made by splitting off questions and developing the means to probe them by a series of distinct, pointed questions.  (See Oct. 30 post.) An account of inference, as I see it, should also illuminate how new hypotheses are constructed and discovered based on scrutinizing previous results, and by unearthing shortcomings and limitations that are communicated systematically by other researchers. Any account that requires an exhaustive list in advance fails to capture this work.

To allude to the example of prion transmission with which I began early posts to this blog, researchers only start out with vague questions in hand: what is causing the epidemic of kuru among the women and children of the Fore tribe?  Is it witchcraft as many thought?  Determining that it was due to cannibalism was just a very first step: understanding the mechanism of disease transmission was a step by step process.  One can exhaust answers to questions at each step precisely along the lines of the local hypotheses tests and estimation methods offered in standard frequentist error statistics. (There is no difference, at least from the current perspective, if one formulates the inference in terms of estimation).

I have and will continue put flesh on the bones of these skeletal claims!

[i] This is true for likelihoodists as well. Since writing this 2 years ago, I’ve learned of Gelman Bayes which is not in terms of inductively obtaining posteriors and which uses a sampling distribution. There are other non-standard accounts. Whether they capture reasoning from severe error probes I cannot say. If they do, I regard them as error statistical.

Categories: Philosophy of Statistics |