Brian Haig

Final part of B. Haig’s ‘What can psych stat reformers learn from the error-stat perspective?’ (Bayesian stats)

.

Here’s the final part of Brian Haig’s recent paper ‘What can psychology’s statistics reformers learn from the error-statistical perspective?’ in Methods in Psychology 2 (Nov. 2020). The full article, which is open access, is here. I will make some remarks in the comments.

5. The error-statistical perspective and the nature of science

Haig

As noted at the outset, the error-statistical perspective has made significant contributions to our philosophical understanding of the nature of science. These are achieved, in good part, by employing insights about the nature and place of statistical inference in experimental science. The achievements include deliberations on important philosophical topics, such as the demarcation of science from non-science, the underdetermination of theories by evidence, the nature of scientific progress, and the perplexities of inductive inference. In this article, I restrict my attention to two such topics: The process of falsification and the structure of modeling.

5.1. Falsificationism

The best known account of scientific method is the so-called hypothetico-deductive method. According to its most popular description, the scientist takes an existing hypothesis or theory and tests indirectly by deriving one or more observational predictions that are subjected to direct empirical test. Successful predictions are taken to provide inductive confirmation of the theory; failed predictions are said to provide disconfirming evidence for the theory. In psychology, NHST is often embedded within such a hypothetico-deductive structure and contributes to weak tests of theories.

Also well known is Karl Popper’s falsificationist construal of the hypothetico-deductive method, which is understood as a general strategy of conjecture and refutation. Although it has been roundly criticised by philosophers of science, it is frequently cited with approval by scientists, including psychologists, even though they do not, indeed could not, employ it in testing their theories. The major reason for this is that Popper does not provide them with sufficient methodological resources to do so.

One of the most important features of the error-statistical philosophy is its presentation of a falsificationist view of scientific inquiry, with error statistics serving an indispensable role in testing. From a sympathetic, but critical, reading of Popper, Mayo endorses his strategy of developing scientific knowledge by identifying and correcting errors through strong tests of scientific claims. Making good on Popper’s lack of knowledge of statistics, Mayo shows how one can properly employ a range of, often familiar, error-statistical methods to implement her all-important severity requirement. Stated minimally, and informally, this requirement says, “A claim is severely tested to the extent that it has been subjected to and passes a test that probably would have found flaws, were they present.” (Mayo, 2018, p. xii) Further, in marked contrast with Popper, who deemed deductive inference to be the only legitimate form of inference, Mayo’s conception of falsification stresses the importance of inductive, or content-increasing, inference in science. We have here, then, a viable account of falsification, which goes well beyond Popper’s account with its lack of operational detail about how to construct strong tests. It is worth noting that the error-statistical stance offers a constructive interpretation of Fisher’s oft-cited remark that the null hypothesis is never proved, only possibly disproved.

5.2. A hierarchy of models

In the past, philosophers of science tended to characterize scientific inquiry by focusing on the general relationship between evidence and theory. Similarly, scientists, even today, commonly speak in general terms of the relationship between data and theory. However, due in good part to the labors of experimentally-oriented philosophers of science, we now know that this coarse-grained depiction is a poor portrayal of science. The error-statistical perspective is one such philosophy that offers a more fine-grained parsing of the scientific process.

Building on Patrick Suppes’ (1962) important insight that science employs a hierarchy of models that ranges from experimental experience to theory, Mayo’s (1996) error-statistical philosophy initially adopted a framework in which three different types of models are interconnected and serve to structure error-statistical inquiry: Primary models, experimental models, and data models. Primary models, which are at the top of the hierarchy, break down a research problem, or question, into a set of local hypotheses that can be investigated using reliable methods. Experimental models take the mid-positon on the hierarchy and structure the particular models at hand. They serve to link primary models to data models. And, data models, which are at the bottom of the hierarchy, generate and model raw data, put them in canonical form, and check whether the data satisfy the assumptions of the experimental models. It should be mentioned that the error-statistical approach has been extended to primary models and theories of a more global nature (Mayo and Spanos, 2010) and, now, also includes a consideration of experimental design and the analysis and generation of data (Mayo, 2018).

This hierarchy of models facilitates the achievement of a number of goals that are important to the error-statistician. These include piecemeal strong testing of local hypotheses rather than broad theories, and employing the model hierarchy as a structuring device to knowingly move back and forth between statistical and scientific hypotheses. The error-statistical perspective insists on maintaining a clear distinction between statistical and scientific hypotheses, pointing out that psychologists often mistakenly take tests of significance to have direct implications for substantive hypotheses and theories.

6. The philosophy of statistics

A heartening attitude that comes through in the error-statistical corpus is the firm belief that the philosophy of statistics is an important part of statistical thinking. This emphasis on the conceptual foundations of the subject contrasts markedly with much of statistical theory, and most of statistical practice. It is encouraging, therefore, that Mayo’s philosophical work has influenced a number of prominent statisticians, who have contributed to the foundations of their discipline. Gelman’s error-statistical philosophy canvassed earlier is a prominent case in point. Through both precept and practice, Mayo’s work makes clear that philosophy can have a direct impact on statistical practice. Given that statisticians operate with an implicit philosophy, whether they know it or not, it is better that they avail themselves of an explicitly thought-out philosophy that serves their thinking and practice in useful ways. More particularly, statistical reformers recommend methods and strategies that have underlying philosophical commitments. It is important that they are identified, described, and evaluated.

The tools used by the philosopher of statistics in order to improve our understanding and use of statistical methods are considerable (Mayo, 2011). They include clarifying disputed concepts, evaluating arguments employed in statistical debates, including the core commitments of rival schools of thought, and probing the deep structure of statistical methods themselves. In doing this work, the philosopher of statistics, as philosopher, ascends to a meta-level to get purchase on their objects of study. This second-order inquiry is a proper part of scientific methodology.

It is important to appreciate that the error-statistical outlook is a scientific methodology in the proper sense of the term. Briefly stated, methodology is the interdisciplinary field that draws from disciplines that include statistics, philosophy of science, history of science, as well as indigenous contributions from the various substantive disciplines. As such, it is the key to a proper understanding of statistical and scientific methods. Mayo’s focus on the role of error statistics in science is deeply informed about the philosophy, history, and theory of statistics, as well as statistical practice. It is for this reason that the error-statistical perspective is strategically positioned to help the reader to go beyond the statistics wars.

7. Conclusion

The error-statistical outlook provides researchers, methodologists, and statisticians with a distinctive and illuminating perspective on statistical inference. Its Popper-inspired emphasis on strong tests is a welcome antidote to the widespread practice of weak statistical hypothesis testing that still pervades psychological research. More generally, the error-statistical standpoint affords psychologists an informative perspective on the nature of good statistical practice in science that will help them understand and transcend the statistics wars into which they have been drawn. Importantly, psychologists should know about the error-statistical perspective as a genuine alternative to the new statistics and Bayesian statistics. The new statisticians, Bayesians statisticians, and those with other preferences should address the challenges to their outlooks on statistics that the error-statistical viewpoint provides. Taking these challenges seriously would enrich psychology’s methodological landscape.

*This article is based on an invited commentary on Deborah Mayo’s book, Statistical inference as severe testing: How to get beyond the statistics wars (Cambridge University Press, 2018), which appeared at https://statmodeling.stat.colombia.edu/2019/04/12 It is adapted with permission. I thank Mayo for helpful feedback on an earlier draft.

Refer to the paper for the references. I invite your comments and questions.

 

Categories: Brian Haig, SIST | 3 Comments

Part 2 of B. Haig’s ‘What can psych stat reformers learn from the error-stat perspective?’ (Bayesian stats)

.

Here’s a picture of ripping open the first box of (rush) copies of Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars*, and here’s a continuation of Brian Haig’s recent paper ‘What can psychology’s statistics reformers learn from the error-statistical perspective?’ in Methods in Psychology 2 (Nov. 2020). Haig contrasts error statistics, the “new statistics”, and Bayesian statistics from the perspective of the statistics wars in psychology. The full article, which is open access, is here. I will make several points in the comments.

Haig

4. Bayesian statistics

Despite its early presence, and prominence, in the history of statistics, the Bayesian outlook has taken an age to assert itself in psychology. However, a cadre of methodologists has recently advocated the use of Bayesian statistical methods as a superior alternative to the messy frequentist practice that dominates psychology’s research landscape (e.g., Dienes, 2011; Kruschke and Liddell, 2018; Wagenmakers, 2007). These Bayesians criticize NHST, often advocate the use of Bayes factors for hypothesis testing, and rehearse a number of other well-known Bayesian objections to frequentist statistical practice.

Of course, there are challenges for Bayesians from the error-statistical perspective, just as there are for the new statisticians. For example, the frequently made claim that p values exaggerate the evidence against the null hypothesis, but Bayes factors do not, is shown by Mayo not to be the case. She also makes the important point that Bayes factors, as they are currently used, do not have the ability to probe errors and, thus, violate the requirement for severe tests. Bayesians, therefore need to rethink whether Bayes factors can be deployed in some way to provide strong tests of hypotheses through error control. As with the new statisticians, Bayesians also need to reckon with the coherent hybrid NHST afforded by the error-statistical perspective, and argue against it, rather than the common inchoate hybrids, if they want to justify abandoning NHST. Finally, I note in passing that Bayesians should consider, among other challenges, Mayo’s critique of the controversial Likelihood Principle, a principle which ignores the post-data consideration of sampling plans.

4.1. Contrasts between the Bayesian and error-statistical perspectives

One of the major achievements of the philosophy of error-statistics is that it provides a comprehensive critical evaluation of the major variants of Bayesian statistical thinking, including the classical subjectivist, “default”, pragmatist, and eclectic options within the Bayesian corpus. Whether the adoption of Bayesian methods in psychology will overcome the disorders of current frequentist practice remains to be seen. What is clear from reading the error-statistical literature, however, is that the foundational options for Bayesians are numerous, convoluted, and potentially bewildering. It would be a worthwhile exercise to chart how these foundational options are distributed across the prominent Bayesian statisticians in psychology. For example, the increasing use of Bayes factors for hypothesis testing purposes is accompanied by disorderliness at the foundational level, just as it is in the Bayesian literature more generally. Alongside the fact that some Bayesians are sceptical of the worth of Bayes factors, we find disagreement about the comparative merits of the subjectivist and default Bayesianism outlooks on Bayes factors in psychology (Wagenmakers et al., 2018).

The philosophy of error-statistics contains many challenges for Bayesians to consider. Here, I want to draw attention to three basic features of Bayesian thinking, which are rejected by the error-statistical approach. First, the error-statistical approach rejects the Bayesian insistence on characterizing the evidential relation between hypothesis and evidence in a universal and logical manner in terms of Bayes’ theorem. Instead, it formulates the relation in terms of the substantive and specific nature of the hypothesis and the evidence with regards to their origin, modeling, and analysis. This is a consequence of a strong commitment to a piecemeal, contextual approach to testing, using the most appropriate frequentist methods available for the task at hand. This contextual attitude to testing is taken up in Section 5.2, where one finds a discussion of the role different models play in structuring and decomposing inquiry.

Second, the error-statistical philosophy also rejects the classical Bayesian commitment to the subjective nature of prior probabilities, which the agent is free to choose, in favour of the more objective process of establishing error probabilities understood in frequentist terms. It also finds unsatisfactory the turn to the more popular objective, or “default”, Bayesian option, in which the agent’s appropriate degrees of belief are constrained by relevant empirical evidence. The error-statistician rejects this default option because it fails in its attempts to unify Bayesian and frequentist ways of determining probabilities.

And, third, the error-statistical outlook employs probabilities to measure how effectively methods facilitate the detection of error, and how those methods enable us to choose between alternative hypotheses. By contrast, orthodox Bayesians use probabilities to measure belief in hypotheses or degrees of confirmation. As noted earlier, most Bayesians are not concerned with error probabilities at all. It is for this reason that error-statisticians will say about Bayesian methods that, without supplementation with error probabilities, they are not capable of providing stringent tests of hypotheses.

4.2. The Bayesian remove from scientific practice

Two additional features of the Bayesian focus on beliefs, which have been noted by philosophers of science and statistics, draw attention to their outlook on science. First, Kevin Kelly and Clark Glymour worry that “Bayesian methods assign numbers to answers instead of producing answers outright.” (2004, p. 112) Their concern is that the focus on the scientist’s beliefs “screens off” the scientist’s direct engagement with the empirical and theoretical activities that are involved in the phenomenology of science. Mayo agrees that we should focus on the scientific phenomena of interest, not the associated epiphenomena of degrees of belief. This preference stems directly from the error-statistician’s conviction that probabilities properly quantify the performance of methods, not the scientist’s degrees of belief.

Second, Henry Kyburg is puzzled by the Bayesian’s desire to “replace the fabric of science… with a vastly more complicated representation in which each statement of science is accompanied by its probability, for each of us.” (1992, p.149) Kyburg’s puzzlement prompts the question, ‘Why should we be interested in each other’s probabilities?’ This is a question raised by David Cox about prior probabilities, and noted by Mayo (2018).

This Bayesian remove from science contrasts with the willingness of the error-statistical perspective to engage more directly with science. Mayo is a philosopher of science as well as statistics, and has a keen eye for scientific practice. Given that contemporary philosophers of science tend to take scientific practice seriously, it comes as no surprise that she brings it to the fore when dealing with statistical concepts and issues. Indeed, her error-statistical philosophy should be seen as a significant contribution to the so-called new experimentalism, with its strong focus, not just on experimental practice in science, but also on the role of statistics in such practice. Her discussion of the place of frequentist statistics in the discovery of the Higgs boson in particle physics is an instructive case in point.

Taken together, these just-mentioned points of difference between the Bayesian and error-statistical philosophies constitute a major challenge to Bayesian thinking that methodologists, statisticians, and researchers in psychology need to confront.

4.3. Bayesian statistics with error-statistical foundations

One important modern variant of Bayesian thinking, which now receives attention within the error-statistical framework, is the falsificationist Bayesianism of Andrew Gelman, which received its major formulation in Gelman and Shalizi (2013). Interestingly, Gelman regards his Bayesian philosophy as essentially error-statistical in nature – an intriguing claim, given the anti-Bayesian preferences of both Mayo and Gelman’s co-author, Cosma Shalizi. Gelman’s philosophy of Bayesian statistics is also significantly influenced by Popper’s view that scientific propositions are to be submitted to repeated criticism in the form of strong empirical tests. For Gelman, best Bayesian statistical practice involves formulating models using Bayesian statistical methods, and then checking them through hypothetico-deductive attempts to falsify and modify those models.

Both the error-statistical and neo-Popperian Bayesian philosophies of statistics extend and modify Popper’s conception of the hypotheticodeductive method, while at the same time offering alternatives to received views of statistical inference. The error-statistical philosophy injects into the hypothetico-deductive method an account of statistical induction that employs a panoply of frequentist statistical methods to detect and control for errors. For its part, Gelman’s Bayesian alternative involves formulating models using Bayesian statistical methods, and then checking them through attempts to falsify and modify those models. This clearly differs from the received philosophy of Bayesian statistical modeling, which is regarded as a formal inductive process.

From the wide-ranging error-statistical evaluation of the major varieties of Bayesian statistical thought on offer, Mayo concludes that Bayesian statistics needs new foundations: In short, those provided by her error-statistical perspective. Gelman acknowledges that his falsificationist Bayesian philosophy is underdeveloped, so it will be interesting to learn how its further development relates to Mayo’s error-statistical perspective. It will also be interesting to see if Bayesian thinkers in psychology engage with Gelman’s brand of Bayesian thinking. Despite the appearance of his work in a prominent psychology journal, they have yet to do so. However, Borsboom and Haig (2013) and Haig (2018) provide sympathetic critical evaluations of Gelman’s philosophy of statistics.

It is notable that in her treatment of Gelman’s philosophy, Mayo emphasizes that she is willing to allow a decoupling of statistical outlooks and their traditional philosophical foundations in favour of different foundations, which are judged more appropriate. It is an important achievement of Mayo’s work that she has been able to consider the current statistics wars without taking a particular side in the debates. She achieves this by examining methods, both Bayesian and frequentist, in terms of whether they violate her minimal severity requirement of “bad evidence, no test”.

I invite your comments and questions.

*This picture was taken by Diana Gillooly, Senior Editor for Mathematical Sciences, Cambridge University Press, at the book display for the Sept. 2018 meeting of the Royal Statistical Society in Cardiff. She also had the honor of doing the ripping. A blogpost on the session I was in is here.

Categories: Brian Haig, SIST | 4 Comments

‘What can psychology’s statistics reformers learn from the error-statistical perspective?’

.

This is the title of Brian Haig’s recent paper in Methods in Psychology 2 (Nov. 2020). Haig is a professor emeritus of psychology at the University of Canterbury. Here he provides both a thorough and insightful review of my book Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (CUP, 2018) as well as an excellent overview of the high points of today’s statistics wars and the replication crisis, especially from the perspective of psychology. I’ll excerpt from his article in a couple of posts. The full article, which is open access, is here

Abstract: In this article, I critically evaluate two major contemporary proposals for reforming statistical thinking in psychology: The recommendation that psychology should employ the “new statistics” in its research practice, and the alternative proposal that it should embrace Bayesian statistics. I do this from the vantage point of the modern error-statistical perspective, which emphasizes the importance of the severe testing of knowledge claims. I also show how this error-statistical perspective improves our understanding of the nature of science by adopting a workable process of falsification and by structuring inquiry in terms of a hierarchy of models. Before concluding, I briefly discuss the importance of the philosophy of statistics for improving our understanding of statistical thinking.

Brian Haig

Keywords: The error-statistical perspective, The new statistics, Bayesian statistics, Falsificationism, Hierarchy of models, Philosophy of statistics Continue reading

Categories: Brian Haig, Statistical Inference as Severe Testing–Review | 12 Comments

B. Haig: The ASA’s 2019 update on P-values and significance (ASA II)(Guest Post)

Brian Haig, Professor Emeritus
Department of Psychology
University of Canterbury
Christchurch, New Zealand

The American Statistical Association’s (ASA) recent effort to advise the statistical and scientific communities on how they should think about statistics in research is ambitious in scope. It is concerned with an initial attempt to depict what empirical research might look like in “a world beyond p<0.05” (The American Statistician, 2019, 73, S1,1-401). Quite surprisingly, the main recommendation of the lead editorial article in the Special Issue of The American Statistician devoted to this topic (Wasserstein, Schirm, & Lazar, 2019; hereafter, ASA II(note)) is that “it is time to stop using the term ‘statistically significant’ entirely”. (p.2) ASA II(note) acknowledges the controversial nature of this directive and anticipates that it will be subject to critical examination. Indeed, in a recent post, Deborah Mayo began her evaluation of ASA II(note) by making constructive amendments to three recommendations that appear early in the document (‘Error Statistics Philosophy’, June 17, 2019). These amendments have received numerous endorsements, and I record mine here. In this short commentary, I briefly state a number of general reservations that I have about ASA II(note). Continue reading

Categories: ASA Guide to P-values, Brian Haig | Tags: | 32 Comments

Blog at WordPress.com.