Posts Tagged With: Linear regression

Phil 6334:Misspecification Testing: Ordering From A Full Diagnostic Menu (part 1)

.

 We’re going to be discussing the philosophy of m-s testing today in our seminar, so I’m reblogging this from Feb. 2012. I’ve linked the 3 follow-ups below. Check the original posts for some good discussion. (Note visitor*)

“This is the kind of cure that kills the patient!”

is the line of Aris Spanos that I most remember from when I first heard him talk about testing assumptions of, and respecifying, statistical models in 1999.  (The patient, of course, is the statistical model.) On finishing my book, EGEK 1996, I had been keen to fill its central gaps one of which was fleshing out a crucial piece of the error-statistical framework of learning from error: How to validate the assumptions of statistical models. But the whole problem turned out to be far more philosophically—not to mention technically—challenging than I imagined.I will try (in 3 short posts) to sketch a procedure that I think puts the entire process of model validation on a sound logical footing.  Thanks to attending several of Spanos’ seminars (and his patient tutorials, for which I am very grateful), I was eventually able to reflect philosophically on aspects of  his already well-worked out approach. (Synergies with the error statistical philosophy, of which this is a part,  warrant a separate discussion.)

Problems of Validation in the Linear Regression Model (LRM)

The example Spanos was considering was the the Linear Regression Model (LRM) which may be seen to take the form:

M0:      yt = β0 + β1xt + ut,  t=1,2,…,n,…

Where µt = β0 + β1xt is viewed as the systematic component, and ut = yt – β0 - β1xt  as the error (non-systematic) component.  The error process {ut, t=1, 2, …, n, …,} is assumed to be Normal, Independent and Identically Distributed (NIID) with mean 0, variance σ2 , i.e. Normal white noise.  Using the data z0:={(xt, yt), t=1, 2, …, n} the coefficients (β0 , β1) are estimated (by least squares)yielding an empirical equation intended to enable us to understand how yt varies with Xt.

Empirical Example

Suppose that in her attempt to find a way to understand and predict changes in the U.S.A. population, an economist discovers, using regression, an empirical relationship that appears to provide almost a ‘law-like’ fit (see figure 1):

yt = 167.115+ 1.907xt + ût,                                    (1)

where yt denotes the population of the USA (in millions), and  xt denotes a secret variable whose identity is not revealed until the end (these 3 posts). Both series refer to annual data for the period 1955-1989.

Figure 1: Fitted Line

Figure 1: Fitted Line

 A Primary Statistical QuestionHow good a predictor is xt?

The goodness of fit measure of this estimated regression, R2=.995, indicates an almost perfect fit. Testing the statistical significance of the coefficients shows them to be highly significant, p-values are zero (0) to a third decimal, indicating a very strong relationship between the variables.  Everything looks hunky dory; what could go wrong?

Is this inference reliable? Not unless the data z0 satisfy the probabilistic assumptions of the LRM, i.e., the errors are NIID with mean 0, variance σ2.

Misspecification (M-S) Tests: Questions of model validation may be  seen as ‘secondary’ questions in relation to primary statistical ones; the latter often concern the sign and magnitude of the coefficients of this linear relationship.

Partitioning the Space of Possible Models: Probabilistic Reduction (PR)

The task in validating a model M0 (LRM) is to test ‘M0is valid’ against everything else!

In other words, if we let H0 assert that the ‘true’ distribution of the sample Z, f(z) belongs to M0, the alternative H1 would be the entire complement of M0, more formally:

H0: f(z) €  M0  vs. H1: f(z) € [P - M0]

where P denotes the set of all possible statistical models that could have given rise to z0:={(xt,yt), t=1, 2, …, n}, and  is “an element of” (all we could find).

The traditional analysis of the LRM has already, implicitly, reduced the space of models that could be considered. It reflects just one way of reducing the set of all possible models of which data z0 can be seen to be a realization. This provides the motivation for Spanos’ modeling approach (first in Spanos 1986, 1989, 1995).

Given that each statistical model arises as a parameterization from the joint distribution:

D(Z1,…,Zn;φ): = D((X1, Y1), (X2, Y2), …., (Xn, Yn); φ),

we can consider how one or another set of probabilistic assumptions on the joint distribution gives rise to different models. The assumptions used to reduce P, the set of all possible models,  to a single model, here the LRM, come from a menu of three broad categories.  These three categories  can always be used in statistical modeling:

(D) Distribution, (M) Dependence, (H) Heterogeneity.

For example, the LRM arises when we reduce P by means of the “reduction” assumptions:

(D) Normal (N), (M) Independent (I), (H) Identically Distributed (ID).

Since we are partitioning or reducing P by means of the probabilistic assumptions, it may be called the Probabilistic Partitioning or Probabilistic Reduction (PR) approach.[i]

The same assumptions, traditionally given by means of the error term, are instead specified in terms of the observable random variables (yt, Xt): [1]-[5] in table 1 to render them directly assessable by the data in question.

Table 1 – The Linear Regression Model (LRM)

yt = β0 + β1xt + ut,  t=1,2,…,n,…

[1] Normality: D(yt |xt; θ) Normal
[2] Linearity: E(yt |Xt=xt) = β0 + β1xt Linear in xt
[3] Homoskedasticity: Var(yt |Xt=xt) =σ2, Free of xt
[4] Independence: {(yt |Xt=xt), t=1,…,n,…} Independent
[5] t-invariance: θ:=(β0 , β1, σ2), constant over t

There are several advantages to specifying the model assumptions in terms of the observables yt and xt instead of the unobservable error term.

First, hidden or implicit assumptions now become explicit ([5]).

Second, some of the error term assumptions, such as having a zero mean, do not look nearly as innocuous when expressed as an assumption concerning the linearity of the regression function between yt and xt .

Third, the LRM (conditional) assumptions can be assessed indirectly from the data via the (unconditional) reduction assumptions, since:

N entails [1]-[3],             I entails [4],             ID entails [5].

As a first step, we partition the set of all possible models coarsely

in terms of reduction assumptions on D(Z1,…,Zn;φ):

LRM Alternatives
(D)   Distribution: Normal non-Normal
(M)   Dependence: Independent Dependent
(H) Heterogeneity: Identically Distributed Non-ID

Given the practical impossibility of probing for violations in all possible directions, the PR approach consciously considers an effective probing strategy to home in on the directions in which the primary statistical model might be potentially misspecified.   Having taken us back to the joint distribution, why not get ideas by looking at yt and xt themselves using a variety of graphical techniques?  This is what the Probabilistic Reduction (PR) approach prescribes for its diagnostic task….Stay tuned!

*Rather than list scads of references, I direct the interested reader to those in Spanos.


[i] This is because when the NIID assumptions are imposed on  the latter simplifies into a product of conditional distributions  (LRM).

See follow-up parts:

PART 2: http://errorstatistics.com/2012/02/23/misspecification-testing-part-2/

PART 3: http://errorstatistics.com/2012/02/27/misspecification-testing-part-3-m-s-blog/

PART 4: http://errorstatistics.com/2012/02/28/m-s-tests-part-4-the-end-of-the-story-and-some-conclusions/

*We also have a visitor to the seminar from Hawaii, John Byrd, a forensic anthropologist and statistical osteometrician. He’s long been active on the blog. I’ll post something of his later on.

Categories: Intro MS Testing, Statistics | Tags: , , , , | 16 Comments

Intro to Misspecification Testing: Ordering From A Full Diagnostic Menu (part 1)

 

“This is the kind of cure that kills the patient!”

is the line of Aris Spanos that I most remember from when I first heard him talk about testing assumptions of, and respecifying, statistical models in 1999.  (The patient, of course, is the statistical model.) On finishing my book, EGEK 1996, I had been keen to fill its central gaps one of which was fleshing out a crucial piece of the error-statistical framework of learning from error: How to validate the assumptions of statistical models. But the whole problem turned out to be far more philosophically—not to mention technically—challenging than I imagined.I will try (in 3 short posts) to sketch a procedure that I think puts the entire process of model validation on a sound logical footing.  Thanks to attending several of Spanos’ seminars (and his patient tutorials, for which I am very grateful), I was eventually able to reflect philosophically on aspects of  his already well-worked out approach. (Synergies with the error statistical philosophy, of which this is a part,  warrant a separate discussion.)

Continue reading

Categories: Intro MS Testing, Statistics | Tags: , , , , | 20 Comments

Blog at WordPress.com. Customized Adventure Journal Theme.

Follow

Get every new post delivered to your Inbox.

Join 313 other followers