The following two sections from Aris Spanos’ contribution to the RMM volume are relevant to the points raised by Gelman (as regards what I am calling the “two slogans”)**.
6.1 Objectivity in Inference (From Spanos, RMM 2011, pp. 166-7)
The traditional literature seems to suggest that ‘objectivity’ stems from the mere fact that one assumes a statistical model (a likelihood function), enabling one to accommodate highly complex models. Worse, in Bayesian modeling it is often misleadingly claimed that as long as a prior is determined by the assumed statistical model—the so called reference prior—the resulting inference procedures are objective, or at least as objective as the traditional frequentist procedures:
“Any statistical analysis contains a fair number of subjective elements; these include (among others) the data selected, the model assumptions, and the choice of the quantities of interest. Reference analysis may be argued to provide an ‘objective’ Bayesian solution to statistical inference in just the same sense that conventional statistical methods claim to be ‘objective’: in that the solutions only depend on model assumptions and observed data.” (Bernardo 2010, 117)
This claim brings out the unfathomable gap between the notion of ‘objectivity’ as understood in Bayesian statistics, and the error statistical viewpoint. As argued above, there is nothing ‘subjective’ about the choice of the statistical model Mθ(z) because it is chosen with a view to account for the statistical regularities in data z0, and its validity can be objectively assessed using trenchant M-S testing. Model validation, as understood in error statistics, plays a pivotal role in providing an ‘objective scrutiny’ of the reliability of the ensuing inductive procedures.
Objectivity does NOT stem from the mere fact that one ‘assumes’ a statistical model. It stems from establishing a sound link between the process generating the data z0 and the assumed Mθ(z), by securing statistical adequacy. The sound application and the objectivity of statistical methods turns on the validity of the assumed statistical model Mθ(z) for the particular data z0. Hence, in the case of ‘reference’ priors, a misspecified statistical model Mθ(z) will also give rise to an inappropriate prior π(θ).
Moreover, there is nothing subjective or arbitrary about the ‘choice of the data and the quantities of interest’ either. The appropriateness of the data is assessed by how well data z0 correspond to the theoretical concepts underlying the substantive model in question. Indeed, one of the key problems in modeling observational data is the pertinent bridging of the gap between the theory concepts and the available data z0 (see Spanos 1995). The choice of the quantities of interest, i.e. the statistical parameters, should be assessed in terms of the statistical adequacy of the statistical model in question and how well these parameters enable one to pose and answer the substantive questions of interest.
For error statisticians, objectivity in scientific inference is inextricably bound up with the reliability of their methods, and hence the emphasis on thorough probing of the different ways an inference can go astray (see Cox and Mayo 2010). It is in this sense that M-S testing to secure statistical adequacy plays a pivotal role in providing an objective scrutiny of the reliability of error statistical procedures.
In summary, the well-rehearsed claim that the only difference between frequentist and Bayesian inference is that they both share several subjective and arbitrary choices but the latter is more honest about its presuppositions, constitutes a lame excuse for the ad hoc choices in the latter approach and highlights the huge gap between the two perspectives on modeling and inference. The appropriateness of every choice made by an error statistician, including the statistical model Mθ(z) and the particular data z0, is subject to independent scrutiny by other modelers.
6.2 ‘All models are wrong, but some are useful’
A related argument—widely used by Bayesians (see Gelman, this volume) and some frequentists—to debase the value of securing statistical adequacy, is that statistical misspecification is inevitable and thus the problem is not as crucial as often claimed. After all, as George Box remarked:
“All models are false, but some are useful!”
A closer look at this locution, however, reveals that it is mired in confusion. First, in what sense ‘all models are wrong’?
This catchphrase alludes to the obvious simplification/idealization associated with any form of modeling: it does not represent the real-world phenomenon of interest in all its details. That, however, is very different from claiming that the underlying statistical model is unavoidably misspecified vis-à-vis the data z0. In other words, this locution conflates two different aspects of empirical modeling:
(a) the ‘realisticness’ of the substantive information (assumptions) comprising the structural model Mφ(z) (substantive premises), vis-à-vis the phenomenon of interest, with
(b) the validity of the probabilistic assumptions comprising the statistical model Mθ(z) (statistical premises), vis-à-vis the data z0 in question.
It’s one thing to claim that a model is not an exact picture of reality in a substantive sense, and totally another to claim that this statistical model Mθ(z) could not have generated data z0 because the latter is statistically misspecified. The distinction is crucial for two reasons. To begin with, the types of errors one needs to probe for and guard against are very different in the two cases. Substantive adequacy calls for additional probing of (potential) errors in bridging the gap between theory and data. Without securing statistical adequacy, however, probing for substantive adequacy is likely to be misleading. Moreover, even though good fit/prediction is neither necessary nor sufficient for statistical adequacy, it is relevant for substantive adequacy in the sense that it provides a measure of the structural model’s comprehensiveness (explanatory capacity) vis-à-vis the phenomenon of interest (see Spanos 2010a). This indicates that part of the confusion pertaining to model validation and its connection (or lack of) to goodness-of-fit/prediction criteria stem from inadequate appreciation of the difference between substantive and statistical information.
Second, how wrong does a model have to be to not be useful? It turns out that the full quotation reflecting the view originally voiced by Box is given in Box and Draper (1987, 74):
“[. . . ] all models are wrong; the practical question is how wrong do they have to be to not be useful.”
In light of that, the only criterion for deciding when a misspecified model is or is not useful is to evaluate its potential unreliability: the implied discrepancy between the relevant actual and nominal error probabilities for a particular inference. When this discrepancy is small enough, the estimated model can be useful for inference purposes, otherwise it is not. The onus, however, is on the practitioner to demonstrate that. Invoking vague generic robustness claims, like ‘small’ departures from the model assumptions do not affect the reliability of inference, will not suffice because they are often highly misleading when appraised using the error discrepancy criterion. Indeed, it’s not the discrepancy between models that matters for evaluating the robustness of inference procedures, as often claimed in statistics textbooks, but the discrepancy between the relevant actual and nominal error probabilities (see Spanos 2009a).
In general, when the estimated model Mθ(z) is statistically misspecified, it is practically useless for inference purposes, unless one can demonstrate that its reliability is adequate for the particular inferences.
*A. Spanos 2011, “Foundational Issues in Statistical Modeling: Statistical Model Specification and Validation, ” RMM Vol. 2, 2011, 146–178, Special Topic: Statistical Science and Philosophy of Science
**Note: Aspects of the on-line exchange between me and Senn are now published in RMM; comments you post or send for the blog (on any of the papers it this special RMM volume) if you wish, can similarly be considered for inclusion in tthe discussions in RMM.
Mayo, this is a nice discussion by Spanos.
But how does it solve Berkson paradox? Given enough data, isn’t every model going to be statiscally misspecified?
Let me try to be more clear. Let’s use Spanos example, CAPM.
At page 161, he rejects almost all assumptions of the model, the only exception is linearity. So, all the inferences made within the model are called into question, because the error probabilities are not properly controlled.
Now let’s suppose for a minute that CAPM had passed all tests. Then, by analogy, all inferences would be correct (in the sense that errors probabilities were controlled).
But,like Berkson argued, it’s likely that our model isn’t just exactly right. We just didn’t have enough data to reject our assumption. Suppose, then, that some years after evaluating CAPM as “statistically adequate”, we get more data. And then we end up rejecting every assumption (like Spanos did in page 161). Then, what we had thought as valid inferences before, wouldn’t be valid inferences anymore, right?
Well, if a practioneer get this kind of “disapointment” very often, like Berkson, then he would extrapolate this kind of thinking to every model he is evaluating.