This follows up on yesterday’s deconstruction:
I’m happy to play devil’s advocate in commenting on Larry’s very interesting and provocative (in a good way) paper on ‘how recent developments in statistical modeling and inference have [a] changed the intended scope of data analysis, and [b] raised new foundational issues that rendered the ‘older’ foundational problems more or less irrelevant’.
The new intended scope, ‘low assumptions, high dimensions’, is delimited by three characteristics:
“1. The number of parameters is larger than the number of data points.
2. Data can be numbers, images, text, video, manifolds, geometric objects, etc.
3. The model is always wrong. We use models, and they lead to useful insights but the parameters in the model are not meaningful.” (p. 1)
In the discussion that follows I focus almost exclusively on the ‘low assumptions’ component of the new paradigm. The discussion by David F. Hendry (2011), “Empirical Economic Model Discovery and Theory Evaluation,” RMM, 2: 115-145, is particularly relevant to some of the issues raised by the ‘high dimensions’ component in a way that complements the discussion that follows.
My immediate reaction to the demarcation based on 1-3 is that the new intended scope, although interesting in itself, excludes the overwhelming majority of scientific fields where restriction 3 seems unduly limiting. In my own field of economics the substantive information comes primarily in the form of substantively specified mechanisms (structural models), accompanied with theory-restricted and substantively meaningful parameters.
In addition, I consider the assertion “the model is always wrong” an unhelpful truism when ‘wrong’ is used in the sense that “the model is not an exact picture of the ‘reality’ it aims to capture”. Worse, if ‘wrong’ refers to ‘the data in question could not have been generated by the assumed model’, then any inference based on such a model will be dubious at best! Continue reading