Wasserman on Spanos and Hennig on “Low Assumptions, High Dimensions” (2011)

(originating U-PHIL : “Deconstructing Larry Wasserman” by Mayo )

________

Thanks to Aris and others for comments .

**Response to Aris Spanos:**

1. You don’t prefer methods based on weak assumptions? Really? I suspect Aris is trying to be provocative. Yes such inferences can be less precise. Good. Accuracy is an illusion if it comes from assumptions, not from data.

2. I do not think I was promoting inferences based on “asymptotic grounds.” If I did, that was not my intent. I want finite sample, distribution free methods. As an example, consider the usual finite sample (order statistics based) confidence interval for the median. No regularity assumptions, no asymptotics, no approximations. What is there to object to?

3. Indeed, I do have to make some assumptions. For simplicity, and because it is often reasonable, I assumed iid in the paper (as I will here). Other than that, where am I making any untestable assumptions in the example of the median?

4. I gave a very terse and incomplete summary of Davies’ work. I urge readers to look at Davies’ papers; my summary does not do the work justice. He certainly did not advocate eyeballing the data.

5. Regarding the individual sequences. First, the bound is indeed tight, meaning that it achieves the optimal rate. Second, if you think you have a good predictor based on a model then just add that model to your set of experts. The method will then do as well as your model if indeed you model is good.

6. Let me clarify what I meant when I said the lasso works. As I said in my paper, with high probability,

where β_{+} is the best sparse, linear predictor. This is a very precise, albeit limited, sense of “works.” But I did not suggest that the linear model is a good approximation.

___________________________________

**Response to Christian Hennig:**

1. Christian points out that there is more to the notion of an assumption than just, what distribution we are assuming. For example, he says that:

The mean in fact assumes that “the data are so that the mean doesn’t give a misleading result”, which doesn’t only depend on the underlying truth but also on how the result is used and interpreted.

In other words, assumptions also include: what we will do with the answers, how we will interpret them, etc. I agree with this. Nonetheless, all else being equal, I would argue that methods that make less stringent distributional assumptions are preferable.

2. When I speak of small *n* and large *p*, keep in mind that the many procedures require exponentially many observations in terms of the number of dimensions. So *n* = 10,000 and *p* = 40 is indeed a “small *n* large* p” *problem. To me,

*is small unless*

*n**2*

*n*>*(or at least*

^{d}*for some*

*n*>*d*^{k}*1*

*k*>*).*

3. I agree that assumptions like sparsity are often assumptions of convenience and may not be realistic.

Larry, thanks for being a good sport!

1. Let me begin by agreeing wholeheartedly with your claim that “accuracy is an illusion if it comes from assumptions, not from data”. However, the accuracy I argue for does come from data when the assumptions have been validated vis-à-vis the data. Hence, my emphasis on model validation: make assumptions but test them thoroughly before drawing inferences!

2. It’s clear that our disagreements stem primarily from working in very different fields. When I hear that you are doing finite sample inference for the median invoking “No regularity assumptions, no asymptotics, no approximations”, my answer is that “everything you infer depends crucially on the most restrictive set of probabilistic assumptions I can think of, IID!” These assumptions are never valid for economic data! What is more, when I make statistical inferences, the choice of the median, the mean or some other parameter(s) stems from the substantive questions of interest; the choice is not based on convenience, it is goal-directed.

Larry: You grant Hennig’s point that “assumptions also include: what we will do with the answers, how we will interpret them, etc… Nonetheless, all else being equal” the less assumptions the better. This could boil down to a tautologous claim– that if there are two or more ways to find something out, and one is less open to threats of error, then it’s preferable—as opposed to a way to proceed. I don’t suppose you start out viewing the landscape of methods where “all things are equal” except one is less open to threats of error. Perhaps your point is that you don’t see any other way to tackle these questions but using crude non parametrics.

Larry: Thanks for the reply. In my comments I may have sounded more critical than I in fact am about your paper, but I didn’t want to waste space writing what I liked in detail (I had a word count limit).

Of course I can’t have objections against saying that “weak assumptions are better than strong ones regarding things we can’t observe”; I have worked on robustness and have no general objections against nonparametrics. Still, it is not straightforward to say what the precise assumptions are for any method, and one shouldn’t just get the information from nice mathematical theorems because a) assuming that there is an underlying true model, a method can still be good in a situation where it’s not proved optimal (and not even *proved* good yet), b) a method can still be bad that is proved to be best in a worst case, c) there is a number of implicit assumptions that we make, most of which are probably dependent on the aim of analysis, that have little to do with an “underlying true distribution”, and that may be critical for so-called low assumption methods.

Regarding your point 2, I know that this is what you meant. I just find it helpful to remind that if we observe more variables and keep n constant, we are not suddenly in a “worse” situation information- or assumption-wise (because in low-d we assumed that what we didn’t see didn’t play a role).