K. W. Staley
Department of Philosophy,
Saint Louis University
(Almost) All about error
BOOK REVIEW Metascience (2012) 21:709–713 DOI 10.1007/s11016-011-9618-1
Deborah G. Mayo and Aris Spanos (eds): Error and inference: Recent exchanges on experimental reasoning, reliability, objectivity, and rationality. New York: Cambridge University Press, 2010, xvii+419 pp
The ERROR’06 (experimental reasoning, reliability, objectivity, and rationality) conference held at Virginia Tech aimed to advance the discussion of some central themes in philosophy of science debated by Deborah Mayo and her more-or-less friendly critics over the years. The volume here reviewed brings together the contributions of these critics and Mayo’s responses to them (with Mayo’s collaborator Aris Spanos). (I helped with the organization of the conference and, with Mayo and Jean Miller, edited a separate collection of workshop papers that were presented there, published as a special issue of Synthese.) My review will focus on a couple of themes I hope to be of interest to a broad philosophical audience, then turn more brieﬂy to an overview of the entire collection. The discussions in Error and Inference (E&I) are indispensable for understanding several current issues regarding the methodology of science.
The remarkably useful introductory chapter lays out the broad themes of the volume and discusses ‘‘The Error-Statistical Philosophy’’. Here, Mayo and Spanos provide the most succinct and non-technical account of the error-statistical approach that has yet been published, a feature that alone should commend this text to anyone who has found it difﬁcult to locate a reading on error statistics suitable for use in teaching.
Mayo holds that the central question for a theory of evidence is not the degree to which some observation E conﬁrms some hypothesis H but how well-probed for error a hypothesis H is by a testing procedure T that results in data x0. This reorientation has far-reaching consequences for Mayo’s approach to philosophy of science. On this approach, addressing the question of when data ‘‘provide good evidence for or a good test of’’ a hypothesis requires attention to characteristics of the process by means of which the data are used to bear on the hypothesis. Mayo identiﬁes the starting point from which her account is developed as the ‘‘Weak Severity Principle’’ (WSP):
Data x0 do not provide good evidence for hypothesis H if x0 results from a test procedure with a very low probability or capacity of having uncovered the falsity of H (even if H is incorrect). (21)
The weak severity principle is then developed into the full severity principle (SP), according to which ‘‘data x0 provide a good indication of or evidence for hypothesis H (just) to the extent that test T has severely passed H with x0’’ where H passes a severe test T with x0 if x0 ‘‘agrees with’’ H and ‘‘with very high probability, test T would have produced a result that accords less well with H than doesx0, if H were false or incorrect’’ (22). This principle constitutes the heart of the error-statistical account of evidence, and E&I, by including some of the most important critiques of the principle, provides a forum in which Mayo and Spanos attempt to correct misunderstandings of the principle and to clarify its meaning and application.
The appearance in the WSP of the disjunctive phrase ‘‘a very low probability or capacity’’ (my emphasis) indicates a point central to much of this clariﬁcatory work. The error-statistical account is resolutely frequentist in its construal of probability. It is commonly held (including by some frequentists) that the rationale for frequentist statistical methods lies exclusively in the fact that they can sometimes be shown to have low error rates in the long run. Throughout E&I, Mayo insists that this ‘‘behaviorist rationale’’ is not applicable when it comes to evaluating a particular body of data in order to determine what inferences may be warranted. That evaluation rests upon thinking about the particular data and the inference at hand in light of the capacity of the test to reveal potential errors in the inference drawn. Frequentist probabilities are part of how one models the error-detecting capacities of the process. As Mayo explains in a later chapter co-authored with David Cox, tests of hypotheses function analogously to measuring instruments: ‘‘Just as with the use of measuring instruments, applied to a speciﬁc case, we employ the performance features to make inferences about aspects of the particular thing that is measured, aspects that the measuring tool is appropriately capable of revealing’’ (257).
One of the most fascinating exchanges in E&I concerns the role of severe testing in the appraisal of ‘‘large-scale’’ theories. According to Mayo, theory appraisal proceeds by a ‘‘piecemeal’’ process of severe probing for speciﬁc ways in which a theory might be in error. She illustrates this with the history of experimental tests of theories of gravity, emphasizing Clifford Will’s parametrized post-Newtonian (PPN) framework, by means of which all metric theories of gravity can be represented in their weak-ﬁeld, slow-motion limits by means of ten parameters. Experimental work on gravity theories then severely tests hypotheses about the values of those parameters. Rather than attempting to conﬁrm or probabilify the general theory of relativity (GTR), the aim is to learn about the ways in which GTR might be in error, more generally to ‘‘measure how far off what a given theory says about a phenomenon can be from what a ‘correct’ theory would need to say about it’’ (55).
Alan Chalmers and Alan Musgrave both challenge this view. According to Chalmers, no general theory, whether ‘‘low level’’ or ‘‘high level’’, can pass a severe test because the content of theories surpasses whatever empirical evidence supports them. As a consequence, Chalmers argues, Mayo’s severe-testing account of scientiﬁc inference must be incomplete because even low-level experimental testing sometimes demands relying on general theoretical claims. Similarly, Musgrave accuses Mayo of holding that (general) theories are not tested by ‘‘testing their consequences’’, but that ‘‘all that we really test are the consequences’’ (105), leaving her with ‘‘nothing to say’’ about the assessment, adoption, or rejection of general theories (106). Continue reading