Wasserman on Spanos and Hennig on “Low Assumptions, High Dimensions” (2011)
(originating U-PHIL : “Deconstructing Larry Wasserman” by Mayo )
Thanks to Aris and others for comments .
Response to Aris Spanos:
1. You don’t prefer methods based on weak assumptions? Really? I suspect Aris is trying to be provocative. Yes such inferences can be less precise. Good. Accuracy is an illusion if it comes from assumptions, not from data.
2. I do not think I was promoting inferences based on “asymptotic grounds.” If I did, that was not my intent. I want finite sample, distribution free methods. As an example, consider the usual finite sample (order statistics based) confidence interval for the median. No regularity assumptions, no asymptotics, no approximations. What is there to object to?
3. Indeed, I do have to make some assumptions. For simplicity, and because it is often reasonable, I assumed iid in the paper (as I will here). Other than that, where am I making any untestable assumptions in the example of the median?
4. I gave a very terse and incomplete summary of Davies’ work. I urge readers to look at Davies’ papers; my summary does not do the work justice. He certainly did not advocate eyeballing the data.
5. Regarding the individual sequences. First, the bound is indeed tight, meaning that it achieves the optimal rate. Second, if you think you have a good predictor based on a model then just add that model to your set of experts. The method will then do as well as your model if indeed you model is good.
6. Let me clarify what I meant when I said the lasso works. As I said in my paper, with high probability,
where β+ is the best sparse, linear predictor. This is a very precise, albeit limited, sense of “works.” But I did not suggest that the linear model is a good approximation.
Response to Christian Hennig:
1. Christian points out that there is more to the notion of an assumption than just, what distribution we are assuming. For example, he says that:
The mean in fact assumes that “the data are so that the mean doesn’t give a misleading result”, which doesn’t only depend on the underlying truth but also on how the result is used and interpreted.
In other words, assumptions also include: what we will do with the answers, how we will interpret them, etc. I agree with this. Nonetheless, all else being equal, I would argue that methods that make less stringent distributional assumptions are preferable.
2. When I speak of small n and large p, keep in mind that the many procedures require exponentially many observations in terms of the number of dimensions. So n = 10,000 and p = 40 is indeed a “small n large p” problem. To me, n is small unless n > 2d (or at least n > dk for some k >1).
3. I agree that assumptions like sparsity are often assumptions of convenience and may not be realistic.