Why attend presentations of interesting papers or go to smashing London sites when you can spend better than an hour racing from here to there because the skeleton key to your rented flat won’t turn the lock (after working fine for days)? [3 other neighbors tried, by the way, it wasn’t just me.] And what are the chances of two keys failing, including the porter’s key, and then a third key succeeding–a spare I’d never used but had placed in a hollowed-out volume of Error and Inference, and kept in an office at the London School of Economics? (Yes, that is what the photo is! A anonymous e-mailer guessed it right, so they must have spies!) As I ran back and forth one step ahead of the locksmith, trying to ignore my still-bum knee (I left the knee brace in the flat) and trying not to get run over—not easy, in London, for me—I mulled over the perplexing query from one of my Ghost Guests (who asked for my positive account).
First, my apologies for how comments are submitted here. I don’t much care for writing in an itty-bitty, increasingly shrinking space. Until my able crew back on Elba—the whaler and the shrimp finagler, who at this moment are probably having a time at the Elbar Room—completely redo this blog, I’m stuck with it.
To your left are links to most of my published work, inserted so that readers (of this blog) looking for detailed arguments could readily find them. There are various slide presentations within those pages as well; I realize many have gotten hooked on bullets. (If you see a presentation listed but no slides please write to email@example.com.) It is a bit silly, then, to list references, but since I’ve been asked, see below for the usual and not-so-usual suspects.
The thing is, I believe I’ve put my cards on the table—and stuck my neck out. To revisit the entire account in a blog post—as some would like—is an invitation to waste our time clearing up the inevitable rounds of misunderstandings that will arise from an absurdly condensed formulation. But having said that, here are a few appetizers I thought I’d already served: hungry readers can order the entrées from my blog menus—much better than my force-feeding.
(1) There is a “statistical philosophy” and a philosophy of science. (a) An error-statistical philosophy alludes to the methodological principles and foundations associated with frequentist error-statistical methods. (b) An error-statistical philosophy of science, on the other hand, involves using the error-statistical methods, formally or informally, to deal with problems of philosophy of science: to model scientific inference (actual or rational), to scrutinize principles of inference, and to address philosophical problems about evidence and inference (the problem of induction, underdetermination, warranting evidence, theory testing, etc.).
I assume the interest here is on the former, (a). I have stated it in numerous ways, but the basic position is that inductive inference—i.e., data-transcending inference—calls for methods of controlling and evaluating error probabilities (even if only approximate). An inductive inference, in this conception, takes the form of inferring hypotheses or claims to the extent that they have been well tested. It also requires reporting claims that have not passed severely, or have passed with low severity. In the “severe testing” philosophy of induction, the quantitative assessment offered by error probabilities tells us not “how probable” but, rather, “how well probed” hypotheses are. The local canonical hypotheses of formal tests and estimation methods need not be the ones we entertain post data; but they give us a place to start without having to go “the designer-clothes” route (see Oct. 30 post).
(2) There are cases where low long-run errors of a procedure are just what is wanted. I call these “behavioristic” contexts. In contexts of “scientific inference,” as I will call them, by contrast, we want to evaluate the evidence or warrant for this hypothesis about this phenomenon (in this world).
Question: How can error probabilities (or error-probing capacities) of a procedure be used to make a specific inference H about the process giving rise to this data? Answer: by enabling the assessment of how well probed or how severely tested H is with data x (along with a background or a “repertoire of errors”). By asking a question of interest in terms of a “data generating process” that we can actually trigger and check (or what Neyman might call a “real statistical experiment”), we can and do build knowledge about the world using statistical reasoning.
While the degree of severity with which a hypothesis H has passed a test T lets us determine whether it is warranted to infer H, the degree of severity is not assigned to H itself: it is an attribute of the test procedure as a whole (including the inference under consideration). (The “testing” logic can be applied equally to cases of “estimations.”)
(3) Although the overarching goal of inquiry is to find out what is (truly) the case about aspects of phenomena, the hypotheses erected in actually finding things out are generally approximations and may even be deliberately false. In scientific contexts, the sampling distribution may be seen to describe what it would be like, statistically, if H was incorrect about some specific aspect of the process generating data x (as modeled). Data x do not supply good evidence for the correctness of H when the data attained are scarcely different from what it would be like were H false. Falsifying H requires more. (i.e., severely warranting H’s denial). [Model assumptions are separately checked.]
I argue that the logic of probability fails as a logic for well testedness, and then I replace it with a probabilistic concept that succeeds. The goal of attaining such well-probed hypotheses differs crucially from seeking highly probable ones (however probability is interpreted).
I am happy to use Popper’s “degree of corroboration” notion so long as its meaning is understood. Clearly, the Popperian idea that claims should be accepted only after passing “severe attempts to falsify them” is in the error statistical spirit; but Popper never had an account of statistics that could do justice to this insight. He never made “the error probability turn.” (He also admitted to me that he regretted not having learned statistics.) A better historical figure, if one wants one, is C. S. Peirce.
(4) An objective account of statistical inference (I published a paper with that name in 1983!—scary!) requires being able to control and approximately evaluate error-probing capabilities (formally or informally). For detailed computations, see that very paper, Mayo 19831, or Mayo and Spanos (2006, 2011), etc.
When a claim is inferred—and this requires detaching it from the assumptions— it must be qualified by an assessment of how well probed it is, in relation to a specific test and data set x. If you want to infer a posterior probability for H, it too must be checked for well testedness, along the same lines.
How do we evaluate a philosophy of statistics? (I discuss this in my RMM contribution; see earlier posts.) We evaluate how well the account captures, and helps solve problems about, statistical learning—understood as the cluster of methods for generating, modeling, and learning from data, and using what is learned to feed into other questions of interest. It should also illuminate scientific inference and empirical progress more generally.
If you accept the basic error-statistical philosophy, or even its minimal requirements, then whatever account you wish to recommend should be able to satisfy its goals and meet its requirements. Simple as that.
Mayo, D. and Spanos, A. (eds.) (2010), Error and Inference, Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, CUP.
2004. Methodology in Practice: Statistical Misspecification Testing. In Philosophy of Science 71, 1007–1025. Symposia Proceedings.
2006. “Severe Testing as a Basic Concept in a Neyman-Pearson Philosophy of Induction,” with Aris Spanos, British Journal of Philosophy of Science, 57, 323-357.
2011. Error Statistics. In Philosophy of Statistics, edited by P. S. Bandyopadhyay and M. R. Forster. Handbook of the Philosophy of Science. Oxford: Elsevier.
 I hadn’t introduced the term “severity” yet, but the reasoning is the same, and there are some nifty pictures and graphs.