We’re going to be discussing the philosophy of m-s testing today in our seminar, so I’m reblogging this from Feb. 2012. I’ve linked the 3 follow-ups below. Check the original posts for some good discussion. (Note visitor*)
“This is the kind of cure that kills the patient!”
is the line of Aris Spanos that I most remember from when I first heard him talk about testing assumptions of, and respecifying, statistical models in 1999. (The patient, of course, is the statistical model.) On finishing my book, EGEK 1996, I had been keen to fill its central gaps one of which was fleshing out a crucial piece of the error-statistical framework of learning from error: How to validate the assumptions of statistical models. But the whole problem turned out to be far more philosophically—not to mention technically—challenging than I imagined. I will try (in 3 short posts) to sketch a procedure that I think puts the entire process of model validation on a sound logical footing. Continue reading
The following two sections from Aris Spanos’ contribution to the RMM volume are relevant to the points raised by Gelman (as regards what I am calling the “two slogans”)**.
6.1 Objectivity in Inference (From Spanos, RMM 2011, pp. 166-7)
The traditional literature seems to suggest that ‘objectivity’ stems from the mere fact that one assumes a statistical model (a likelihood function), enabling one to accommodate highly complex models. Worse, in Bayesian modeling it is often misleadingly claimed that as long as a prior is determined by the assumed statistical model—the so called reference prior—the resulting inference procedures are objective, or at least as objective as the traditional frequentist procedures:
“Any statistical analysis contains a fair number of subjective elements; these include (among others) the data selected, the model assumptions, and the choice of the quantities of interest. Reference analysis may be argued to provide an ‘objective’ Bayesian solution to statistical inference in just the same sense that conventional statistical methods claim to be ‘objective’: in that the solutions only depend on model assumptions and observed data.” (Bernardo 2010, 117)
This claim brings out the unfathomable gap between the notion of ‘objectivity’ as understood in Bayesian statistics, and the error statistical viewpoint. As argued above, there is nothing ‘subjective’ about the choice of the statistical model Mθ(z) because it is chosen with a view to account for the statistical regularities in data z0, and its validity can be objectively assessed using trenchant M-S testing. Model validation, as understood in error statistics, plays a pivotal role in providing an ‘objective scrutiny’ of the reliability of the ensuing inductive procedures.
A Better Way The traditional approach described in Part 2 did not detect the presence of mean-heterogeneity and so it misidentified temporal dependence as the sole source of misspecification associated with the original LRM.
On the basis of figures 1-3 we can summarize our progress in detecting potential departures from the LRM model assumptions to probe thus far:
Discriminating and Amplifying the Effects of Mistakes We could correctly assess dependence if our data were ID and not obscured by the influence of the trending mean. Although, we can not literally manipulate relevant factors, we can ‘subtract out’ the trending mean in a generic way to see what it would be like if there were no trending mean. Here are the detrended xt and yt.
Fig. 4: Detrended Population (y – trend )
Part 1 is here.
Graphing t-plots (This is my first experiment with blogging data plots, they have been blown up a bit, so hopefully they are now sufficiently readable).
Here are two plots (t-plots) of the observed data where yt is the population of the USA in millions, and xt our “secret” variable, to be revealed later on, both over time (1955-1989).
Fig 1: USA Population (y)
Fig. 2: Secret variable (x)
Figure 3: A typical realization of a NIID process.
Pretty clearly, there are glaring departures from IID when we compare a typical realization of a NIID process, in fig. 3, with the t-plots of the two series in figures 1-2. In particular, both data series show the mean is increasing with time – that is, strong mean-heterogeneity (trending mean).Our recommended next step would be to continue exploring the probabilistic structure of the data in figures 1 and 2 with a view toward thoroughly assessing the validity of the LRM assumptions - (table 1). But first let us take a quick look at the traditional approach for testing assumptions, focusing just on assumption  traditionally viewed as error non-autocorrelation: E(ut,us)=0 for t≠s, t,s=1,2,…,n. Continue reading
“This is the kind of cure that kills the patient!”
is the line of Aris Spanos that I most remember from when I first heard him talk about testing assumptions of, and respecifying, statistical models in 1999. (The patient, of course, is the statistical model.) On finishing my book, EGEK 1996, I had been keen to fill its central gaps one of which was fleshing out a crucial piece of the error-statistical framework of learning from error: How to validate the assumptions of statistical models. But the whole problem turned out to be far more philosophically—not to mention technically—challenging than I imagined.I will try (in 3 short posts) to sketch a procedure that I think puts the entire process of model validation on a sound logical footing. Thanks to attending several of Spanos’ seminars (and his patient tutorials, for which I am very grateful), I was eventually able to reflect philosophically on aspects of his already well-worked out approach. (Synergies with the error statistical philosophy, of which this is a part, warrant a separate discussion.)
The article “Foundational Issues in Statistical Modeling: Statistical Model Specification and Validation*” by Aris Spanos has now been published in our special volume of the on-line journal, Rationality, Markets, and Morals (Special Topic: Statistical Science and Philosophy of Science: Where Do/Should They Meet?”)
Statistical model specification and validation raise crucial foundational problems whose pertinent resolution holds the key to learning from data by securing the reliability of frequentist inference. The paper questions the judiciousness of several current practices, including the theory-driven approach, and the Akaike-type model selection procedures, arguing that they often lead to unreliable inferences. This is primarily due to the fact that goodness-of-fit/prediction measures and other substantive and pragmatic criteria are of questionable value when the estimated model is statistically misspecified. Foisting one’s favorite model on the data often yields estimated models which are both statistically and substantively misspecified, but one has no way to delineate between the two sources of error and apportion blame. The paper argues that the error statistical approach can address this Duhemian ambiguity by distinguishing between statistical and substantive premises and viewing empirical modeling in a piecemeal way with a view to delineate the various issues more effectively. It is also argued that Hendry’s general to specific procedures does a much better job in model selection than the theory-driven and the Akaike-type procedures primary because of its error statistical underpinnings.