In view of some questions about “behavioristic” vs “evidential” construals of frequentist statistics (from the last post), and how the error statistical philosophy tries to improve on Birnbaum’s attempt at providing the latter, I’m reblogging a portion of a post from Nov. 5, 2011 when I also happened to be in London. (The beginning just records a goofy mishap with a skeletal key, and so I leave it out in this reblog.) Two papers with much more detail are linked at the end.
(1) There is a “statistical philosophy” and a philosophy of science. (a) An error-statistical philosophy alludes to the methodological principles and foundations associated with frequentist error-statistical methods. (b) An error-statistical philosophy of science, on the other hand, involves using the error-statistical methods, formally or informally, to deal with problems of philosophy of science: to model scientific inference (actual or rational), to scrutinize principles of inference, and to address philosophical problems about evidence and inference (the problem of induction, underdetermination, warranting evidence, theory testing, etc.).
I assume the interest here* is on the former, (a). I have stated it in numerous ways, but the basic position is that inductive inference—i.e., data-transcending inference—calls for methods of controlling and evaluating error probabilities (even if only approximate). An inductive inference, in this conception, takes the form of inferring hypotheses or claims to the extent that they have been well tested. It also requires reporting claims that have not passed severely, or have passed with low severity. In the “severe testing” philosophy of induction, the quantitative assessment offered by error probabilities tells us not “how probable” but, rather, “how well probed” hypotheses are. The local canonical hypotheses of formal tests and estimation methods need not be the ones we entertain post data; but they give us a place to start without having to go “the designer-clothes” route (see Oct. 30 post).
(2) There are cases where low long-run errors of a procedure are just what is wanted. I call these “behavioristic” contexts. In contexts of “scientific inference,” as I will call them, by contrast, we want to evaluate the evidence or warrant for this hypothesis about this phenomenon (in this world).
Question: How can error probabilities (or error-probing capacities) of a procedure be used to make a specific inference H about the process giving rise to this data? Answer: by enabling the assessment of how well probed or how severely tested H is with data x (along with a background or a “repertoire of errors”). By asking a question of interest in terms of a “data generating process” that we can actually trigger and check (or what Neyman might call a “real statistical experiment”), we can and do build knowledge about the world using statistical reasoning.
While the degree of severity with which a hypothesis H has passed a test T lets us determine whether it is warranted to infer H, the degree of severity is not assigned to H itself: it is an attribute of the test procedure as a whole (including the inference under consideration). (The “testing” logic can be applied equally to cases of “estimations.”)
(3) Although the overarching goal of inquiry is to find out what is (truly) the case about aspects of phenomena, the hypotheses erected in actually finding things out are generally approximations and may even be deliberately false. In scientific contexts, the sampling distribution may be seen to describe what it would be like, statistically, if H was incorrect about some specific aspect of the process generating data x (as modeled). Data x do not supply good evidence for the correctness of H when the data attained are scarcely different from what it would be like were H false. Falsifying H requires more. (i.e., severely warranting H’s denial). [Model assumptions are separately checked.]
I argue that the logic of probability is inadequate as a logic for well testedness, and then I replace it with a probabilistic concept that succeeds. The goal of attaining such well-probed hypotheses differs crucially from seeking highly probable ones (however probability is interpreted).
I am happy to use Popper’s “degree of corroboration” notion so long as its meaning is understood. Clearly, the Popperian idea that claims should be accepted only after passing “severe attempts to falsify them” is in the error statistical spirit; but Popper never had an account of statistics that could do justice to this insight. He never made “the error probability turn.” (He also admitted to me that he regretted not having learned statistics.) A better historical figure, if one wants one, is C. S. Peirce.
(4) An objective account of statistical inference (I published a paper with that name in 1983!—scary!) requires being able to control and approximately evaluate error-probing capabilities (formally or informally). For detailed computations, see that very paper, Mayo 19831, or Mayo and Spanos (2006, 2011). [The 2006 paper is also referenced and linked below.]
When a claim is inferred—and this requires detaching it from the assumptions— it must be qualified by an assessment of how well probed it is, in relation to a specific test and data set x. If you want to infer a posterior probability for H, it too must be checked for well testedness, along the same lines.
How do we evaluate a philosophy of statistics? (I discuss this in my RMM contribution; see earlier posts.) We evaluate how well the account captures, and helps solve problems about, statistical learning—understood as the cluster of methods for generating, modeling, and learning from data, and using what is learned to feed into other questions of interest. It should also illuminate scientific inference and empirical progress more generally.
If you accept the basic error-statistical philosophy, or even its minimal requirements, then whatever account you wish to recommend should be able to satisfy its goals and meet its requirements. Simple as that.
Mayo, D. and Spanos, A. (2006. “Severe Testing as a Basic Concept in a Neyman-Pearson Philosophy of Induction,” with Aris Spanos, British Journal of Philosophy of Science, 57, 323-357.
Mayo, D. and Cox, D. (2006/2010). “Frequentist Statistics as a Theory of Inductive Inference” in Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science (D Mayo and A. Spanos eds.), Cambridge: Cambridge University Press: 1-27. This paper appeared in The Second Erich L. Lehmann Symposium: Optimality, 2006, Lecture Notes-Monograph Series, Volume 49, Institute of Mathematical Statistics, pp. 247-275.
*In the particular post being reblogged, I was responding to statisticians, but philosophers of science are likely to be interested in the latter (b). Thus, I will separately post some things that link statistical science and philosophy of science over the weekend.
 I hadn’t introduced the term “severity” yet, but the reasoning is the same, and there are some nifty pictures and graphs. On the other hand, I seem to recall a typo (having to do with a decimal someplace).