Here’s the final part of Brian Haig’s recent paper ‘What can psychology’s statistics reformers learn from the error-statistical perspective?’ in *Methods in Psychology *2 (Nov. 2020). The full article, which is open access, is here. I will make some remarks in the comments.

**5. The error-statistical perspective and the nature of science**

As noted at the outset, the error-statistical perspective has made significant contributions to our philosophical understanding of the nature of science. These are achieved, in good part, by employing insights about the nature and place of statistical inference in experimental science. The achievements include deliberations on important philosophical topics, such as the demarcation of science from non-science, the underdetermination of theories by evidence, the nature of scientific progress, and the perplexities of inductive inference. In this article, I restrict my attention to two such topics: The process of falsification and the structure of modeling.

*5.1. Falsificationism*

The best known account of scientific method is the so-called hypothetico-deductive method. According to its most popular description, the scientist takes an existing hypothesis or theory and tests indirectly by deriving one or more observational predictions that are subjected to direct empirical test. Successful predictions are taken to provide inductive confirmation of the theory; failed predictions are said to provide disconfirming evidence for the theory. In psychology, NHST is often embedded within such a hypothetico-deductive structure and contributes to weak tests of theories.

Also well known is Karl Popper’s falsificationist construal of the hypothetico-deductive method, which is understood as a general strategy of conjecture and refutation. Although it has been roundly criticised by philosophers of science, it is frequently cited with approval by scientists, including psychologists, even though they do not, indeed could not, employ it in testing their theories. The major reason for this is that Popper does not provide them with sufficient methodological resources to do so.

One of the most important features of the error-statistical philosophy is its presentation of a falsificationist view of scientific inquiry, with error statistics serving an indispensable role in testing. From a sympathetic, but critical, reading of Popper, Mayo endorses his strategy of developing scientific knowledge by identifying and correcting errors through strong tests of scientific claims. Making good on Popper’s lack of knowledge of statistics, Mayo shows how one can properly employ a range of, often familiar, error-statistical methods to implement her all-important severity requirement. Stated minimally, and informally, this requirement says, “A claim is severely tested to the extent that it has been subjected to and passes a test that probably would have found flaws, were they present.” (Mayo, 2018, p. xii) Further, in marked contrast with Popper, who deemed deductive inference to be the only legitimate form of inference, Mayo’s conception of falsification stresses the importance of inductive, or content-increasing, inference in science. We have here, then, a viable account of falsification, which goes well beyond Popper’s account with its lack of operational detail about how to construct strong tests. It is worth noting that the error-statistical stance offers a constructive interpretation of Fisher’s oft-cited remark that the null hypothesis is never proved, only possibly disproved.

*5.2. A hierarchy of models*

In the past, philosophers of science tended to characterize scientific inquiry by focusing on the general relationship between evidence and theory. Similarly, scientists, even today, commonly speak in general terms of the relationship between data and theory. However, due in good part to the labors of experimentally-oriented philosophers of science, we now know that this coarse-grained depiction is a poor portrayal of science. The error-statistical perspective is one such philosophy that offers a more fine-grained parsing of the scientific process.

Building on Patrick Suppes’ (1962) important insight that science employs a hierarchy of models that ranges from experimental experience to theory, Mayo’s (1996) error-statistical philosophy initially adopted a framework in which three different types of models are interconnected and serve to structure error-statistical inquiry: Primary models, experimental models, and data models. Primary models, which are at the top of the hierarchy, break down a research problem, or question, into a set of local hypotheses that can be investigated using reliable methods. Experimental models take the mid-positon on the hierarchy and structure the particular models at hand. They serve to link primary models to data models. And, data models, which are at the bottom of the hierarchy, generate and model raw data, put them in canonical form, and check whether the data satisfy the assumptions of the experimental models. It should be mentioned that the error-statistical approach has been extended to primary models and theories of a more global nature (Mayo and Spanos, 2010) and, now, also includes a consideration of experimental design and the analysis and generation of data (Mayo, 2018).

This hierarchy of models facilitates the achievement of a number of goals that are important to the error-statistician. These include piecemeal strong testing of local hypotheses rather than broad theories, and employing the model hierarchy as a structuring device to knowingly move back and forth between statistical and scientific hypotheses. The error-statistical perspective insists on maintaining a clear distinction between statistical and scientific hypotheses, pointing out that psychologists often mistakenly take tests of significance to have direct implications for substantive hypotheses and theories.

**6. The philosophy of statistics**

A heartening attitude that comes through in the error-statistical corpus is the firm belief that the philosophy of statistics is an important part of statistical thinking. This emphasis on the conceptual foundations of the subject contrasts markedly with much of statistical theory, and most of statistical practice. It is encouraging, therefore, that Mayo’s philosophical work has influenced a number of prominent statisticians, who have contributed to the foundations of their discipline. Gelman’s error-statistical philosophy canvassed earlier is a prominent case in point. Through both precept and practice, Mayo’s work makes clear that philosophy can have a direct impact on statistical practice. Given that statisticians operate with an implicit philosophy, whether they know it or not, it is better that they avail themselves of an explicitly thought-out philosophy that serves their thinking and practice in useful ways. More particularly, statistical reformers recommend methods and strategies that have underlying philosophical commitments. It is important that they are identified, described, and evaluated.

The tools used by the philosopher of statistics in order to improve our understanding and use of statistical methods are considerable (Mayo, 2011). They include clarifying disputed concepts, evaluating arguments employed in statistical debates, including the core commitments of rival schools of thought, and probing the deep structure of statistical methods themselves. In doing this work, the philosopher of statistics, as philosopher, ascends to a meta-level to get purchase on their objects of study. This second-order inquiry is a proper part of scientific methodology.

It is important to appreciate that the error-statistical outlook is a scientific methodology in the proper sense of the term. Briefly stated, methodology is the interdisciplinary field that draws from disciplines that include statistics, philosophy of science, history of science, as well as indigenous contributions from the various substantive disciplines. As such, it is the key to a proper understanding of statistical and scientific methods. Mayo’s focus on the role of error statistics in science is deeply informed about the philosophy, history, and theory of statistics, as well as statistical practice. It is for this reason that the error-statistical perspective is strategically positioned to help the reader to go beyond the statistics wars.

**7. Conclusion**

The error-statistical outlook provides researchers, methodologists, and statisticians with a distinctive and illuminating perspective on statistical inference. Its Popper-inspired emphasis on strong tests is a welcome antidote to the widespread practice of weak statistical hypothesis testing that still pervades psychological research. More generally, the error-statistical standpoint affords psychologists an informative perspective on the nature of good statistical practice in science that will help them understand and transcend the statistics wars into which they have been drawn. Importantly, psychologists should know about the error-statistical perspective as a genuine alternative to the new statistics and Bayesian statistics. The new statisticians, Bayesians statisticians, and those with other preferences should address the challenges to their outlooks on statistics that the error-statistical viewpoint provides. Taking these challenges seriously would enrich psychology’s methodological landscape.

*****This article is based on an invited commentary on Deborah Mayo’s book, *Statistical inference as severe testing: How to get beyond the statistics wars* (Cambridge University Press, 2018), which appeared at https://statmodeling.stat.colombia.edu/2019/04/12 It is adapted with permission. I thank Mayo for helpful feedback on an earlier draft.

Refer to the paper for the references. I invite your comments and questions.

Brian:

I really appreciate your linking the philosophies of statistics with Popperian falsification. Using probability to assess severity tries to supply a way to capture “corroboration” or “well-testedness”. In Excursion 2 , I try to use the idea to improve on Popper’s account which, as you say, appears to rob us of ampliative inference. You’re allowed to rationally prefer a claim or theory that is best tested, but he will not allow you to say it is “justified”.I generally use the term “warranted”. We can readily change Popper’s language (souvenir F), and with it his view of induction and demarcation. Or so I propose. I don’t like alluding to a hypothetical deductive account because it’s so unclear what failing to falsify permits. Some will say it gives a B-boost or “confirms” the claim that has passed. Neither of these are Popperian. The test it has passed must have been one that probably would have found flaws in the claim or theory if they are present. The severe tester entirely agrees with Popper there. The only difference is that I think we can assess well-testedness. But I allow in SIST that one can go pretty far retaining “weak severity” and only criticizing poor tests. However, in order to falsify interesting claims, you must corroborate a “falsifying hypothesis” and this requires strong severity, that a claim be warranted if it passes with severity. Most philosophies of statistics today do not falsify, they stop with a comparative appraisal of “better supported” or the like. That is why Gelman says that using Bayes factors does not falsify. Strictly speaking, any account can be made to falsify by adding a falsification rule. However, then it must be shown to be a reliable rule.

The recommendation to drop P-value thresholds, of course, blocks falsification rules. The trouble is that if you can’t say, ahead of time, that some outcomes will not be allowed to count in favor of a claim, then you relinquish the ability to test it. That’s why I don’t think that those who claim to support such a view can really mean it.

Deborah:

Many thanks for your informative comment regarding the last post of my article. You say you don’t like alluding to the hypothetico-deductive method because it doesn’t deal with failed attempts to falsify hypotheses. Fair enough. This raises for me the question of how helpful the usual appeals to the hypothetico-deductive method are. I think it’s important to distinguish between a “confirmationist” account (e.g., Hempel) and Popper’s falsificationist alternative. Both seem to me to be rather thin accounts of scientific method that lack the detail required for researchers to follow. In psychology, and I daresay elsewhere, researchers pay lip service to Popper but tacitly adopt a weak confirmationist strategy. Your neo-Popperian approach to severe testing by using frequentist statistical methods has the merit of giving researchers something they can use.

Brian:

I agree that the usual appeals to the hypothetico-deductive method are unhelpful, and I would only use the term in teaching about some schools or approaches in philosophy of science. As far as Popper being thin, I clearly agree that the typical “naive” or “dogmatic” Popper (Lakatos’ terms) often taught are thin. But if you get beyond a Cliff’s Notes version of philosophy of science, you find the ingredients for a very rich and illuminating account–even though some alterations are needed.

It’s odd that Popper’s approach was all about severe testing, and yet he cannot say that a method probably would have falsified a claim if it were false–what severity requires. He can only say H passes severely if it accords with the data and is “theoretically novel”–his attempt to cash out severity appealing to novelty. But deeming the first hypothesis that accords with the data warranted is not a reliable method. Worrall was right to criticize this view (SIST p. 91). But his “use novelty isn’t either.

The american statistical association is putting together videos of some sort on philosophy of science and statistics. I’ve been asked to take part the other day, and I said yes, after my upcoming LSE seminar (remote), but I really don’t know anything about it. These are the kind of issues that should come up I think, so I’m glad they’re doing it.