4.8 All Models Are False
. . . it does not seem helpful just to say that all models are wrong. The very word model implies simplification and idealization. . . . The construction of idealized representations that capture important stable aspects of such systems is, however, a vital part of general scientific analysis. (Cox 1995, p. 456)
A popular slogan in statistics and elsewhere is “all models are false!” Is this true? What can it mean to attribute a truth value to a model? Clearly what is meant involves some assertion or hypothesis about the model – that it correctly or incorrectly represents some phenomenon in some respect or to some degree. Such assertions clearly can be true. As Cox observes, “the very word model implies simplification and idealization.” To declare, “all models are false” by dint of their being idealizations or approximations, is to stick us with one of those “all flesh is grass” trivializations (Section 4.1). So understood, it follows that all statistical models are false, but we have learned nothing about how statistical models may be used to infer true claims about problems of interest. Since the severe tester’s goal in using approximate statistical models is largely to learn where they break down, their strict falsity is a given. Yet it does make her wonder why anyone would want to place a probability assignment on their truth, unless it was 0? Today’s tour continues our journey into solving the problem of induction (Section 2.7).
Assigning a probability to either a substantive or a statistical model is very different from asserting it is approximately correct or adequate for solving a problem. The philosopher of science Peter Achinstein had hoped to discover that his scientific heroes, Isaac Newton and John Stuart Mill, were Bayesian probabilists, but he was disappointed; what he finds is enlightening:
Neither in their abstract formulations of inductive generalizations (Newton’s rule 3; Mill’s definition of ‘induction’) nor in their examples of particular inductions to general conclusions of the form ‘all As are Bs’ does the term ‘probability’ occur. Both write that from certain specific facts we can conclude general ones – not that we can conclude general propositions with probability, or that general propositions have a probability . . . From the inductive premises we simply conclude that the generalization is true, or as Newton allows in rule 4, ‘very nearly true,’ by which he appears to mean not ‘probably true’ but ‘approximately true’ (as he does when he takes the orbits of the satellites of Jupiter to be circles rather than ellipses). (Achinstein 2010, p. 176)
There are two main ways the “all models are false” charge comes about:
1. The statistical inference refers to an idealized and partial representation of a theory or process.
2. The probability model, to which a statistical inference refers, is at most an idealized and partial representation of the actual data-generating source.
Neither of these facts precludes the use of these false models to find out true things, or to correctly solve problems. On the contrary, it would be impossible to learn about the world if we did not deliberately falsify and simplify.
Adequacy for a Problem. The statistician George Box, to whom the slogan “all models are wrong” is often attributed, goes on to add “But some are useful” (1979, p. 202). I’ll go further still: all models are false, no useful models are true. Were a model so complex as to represent every detail of data “realistically,” it wouldn’t be useful for finding things out. Let’s say a statistical model is useful by being adequate for a problem, meaning it may be used to find true or approximately true solutions. Statistical hypotheses may be seen as conjectured solutions to a problem. A statistical model is adequate for a problem of statistical inference (which is only a subset of uses of statistical models) if it enables controlling and assessing if purported solutions are well or poorly probed, and to what degree. Through approximate models, we learn about the “important stable aspects” or systematic patterns when we are in the context of phenomena that exhibit statistical variability. When I speak of ruling out mistaken interpretations of data, I include mistakes about theoretical and causal claims. If you’re an anti-realist about science, you will interpret, or rather reinterpret, theoretical claims in terms of observable claims of some sort. One such anti-realist view we’ve seen is instrumentalism: unobservables including genes, particles, light bending may be regarded as at most instruments for finding out about observable regularities and predictions. Fortunately we won’t have to engage the thorny problem of realism in science, we can remain agnostic. Neither my arguments, nor the error statistical philosophy in general, turn on whether one adopts one of the philosophies of realism or anti-realism. Today’s versions of realism and anti-realism are quite frankly too hard to tell apart to be of importance to our goals. The most important thing is that both realists and non-realists require an account of statistical inference. Moreover, whatever one’s view of scientific theories, a statistical analysis of problems of actual experiments involves abstraction and creative analogy. (pp. 296-7)
……jumping to the end of Excursion 4 tour IV:
Take-away of Excursion 4. For a severe tester, a crucial part of a statistical method’s objectivity (Tour I) is registering how test specifications such as sample size (Tour II) and biasing selection effects (Tour III) alter its error probing capacities. Testing assumptions (Tour IV) is also crucial to auditing. If a probabilist measure such as a Bayes factor is taken as a gold standard for critiquing error statistical tests, significance levels and other error probabilities appear to overstate evidence – at least on certain choices of priors. From the perspective of the severe tester, it can be just the reverse. Preregistered reports are promoted to advance replication by blocking selective reporting. Thus there is a tension between preregistration and probabilist accounts that downplay error probabilities, that declare them only relevant for long runs, or tantamount to considering hidden intentions. Moreover, in the interest of promoting Bayes factors, researchers who most deserve censure are thrown a handy life preserver. Violating the LP, using the sampling distribution for inferences with the data at hand, and the importance of error probabilities form an interconnected web of severe testing. They are necessary for every one of the requirements for objectivity. (p. 320)
_______________________
This excerpt comes from Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (Mayo, CUP 2018).
Earlier excerpts and mementos from SIST up to Dec 31, 20018 are here.
Jan 10, 2019 Excerpt from SIST is here.
Jan 13, 2019 Mementos from SIST (Excursion 4) are here. These are summaries of all 4 tours.
Statinfasst (the name of our ship) has embarked on a new journey. Thus, I won’t continue to post mementos and excerpts that I’ve been reporting from an earlier trip. (The trip was completed, all passengers safe, only I didn’t post items after Excursion 4–ran out of time. Material from excursions 5 and 6 won’t appear here until April (when our seminar reaches those ports). You can follow us in the PhilStat Spring 19 page atop this blog. I will also be posting materials from our seminar, and ordinary posts. I hope to get some discussion notes from readers.
Dr. Mayo, thanks for your post. As far as I know, statistical estimation and error testing involve the belief in the existence of (population) parameters, which are unobservable. So, couldn’t we say that those who believe in the error statistics theory of evidence are in the realist side of the spectrum? It seems unlikely to be, at the same time, both an agnostic to the realism/anti-realism debate and believe that knowledge can be obtained of unobservable entities like parameters.
One could view them as abstractions that are useful for problem solving. Difficulties may arise in adhering to this, but it’s not impossible. Even a realist, in my view, ought to distinguish those cases where the theoretical parameter is at best an approximation to something real, based on the level of testing so far available. In other words, even a realist ought to be non-realist about some parameters, at a given time; others may be part of well-corroborated theories. Still others may be so well understood that we’re able to use them to find out about thus-far unexplored domains. In that case they’re as real as they need to be. In other words, things are called “observational” when they pass certain tests, and these can be passed (with severity) by so-called theoreticals.