** **Larry Wasserman (“Normal Deviate”) has announced he will stop blogging (for now at least). That means we’re losing one of the wisest blog-voices on issues relevant to statistical foundations (among many other areas in statistics). Whether this lures him back or reaffirms his decision to stay away, I thought I’d reblog my (2012) “deconstruction” of him (in relation to a paper linked below)**[i]**

*Deconstructing Larry Wasserman* **[i] by D. Mayo**

The temptation is strong, but I shall refrain from using the whole post to deconstruct Al Franken’s 2003 quip about media bias (from *Lies and Lying Liars Who Tell Them: A Fair and Balanced Look at the Right)*, with which Larry Wasserman begins his paper “Low Assumptions, High Dimensions” (2011) in his contribution to *Rationality, Markets and Morals* (RMM) Special Topic: Statistical Science and Philosophy of Science:

Wasserman: There is a joke about media bias from the comedian Al Franken:

‘To make the argument that the media has a left- or right-wing, or a liberal or a conservative bias, is like asking if the problem with Al-Qaeda is: do they use too much oil in their hummus?’

According to Wasserman, “a similar comment could be applied to the usual debates in the foundations of statistical inference.”

Although it’s not altogether clear what Wasserman means by his analogy with comedian (now senator) Franken, it’s clear enough what Franken meant if we follow up the quip with the next sentence in his text (which Wasserman omits): **“The problem with al Qaeda is that they’re trying to kill us!”** (p. 1). The rest of Franken’s opening chapter is not about al Qaeda but about bias in media. Conservatives, he says, decry what they claim is a liberal bias in mainstream media. Franken rejects their claim.

The mainstream media does not have a liberal bias. And for all their other biases . . . , the mainstream media . . . at leasttryto be fair. …There is, however, a right-wing media. . . . They are biased. And they have an agenda…The members of the right-wing media are not interested in conveying the truth… . They are an indispensable component of the right-wing machine that has taken over our country… . We have to be vigilant. And we have to be more than vigilant. We have to fight back… . Let’s call them what they are: liars. Lying, lying, liars. (Franken, pp. 3-4)

When I read this in 2004 (when Bush was in office), I couldn’t have agreed more. How things change*. Now, of course, any argument that swerves from the politically correct is by definition unsound, irrelevant, and/ or biased. [ii](December 2016 update: This just shows how things get topsy-turvy every 5-8 years. Now we have extremes on both sides.)

But what does this have to do with Bayesian-frequentist foundations? What is Wasserman, deep down, really trying to tell us by way of this analogy (if only subliminally)? Such are my ponderings—and thus this deconstruction. (I will invite your “U-Phils” at the end[a].) I will allude to passages from my contribution to RMM (2011) (in red).

**A.What Is the Foundational Issue?**

Wasserman:To me, the most pressing foundational question is: how do we reconcile the two most powerful needs in modern statistics: the need to make methods assumption free and the need to make methods work in high dimensions… . The Bayes-Frequentist debate is not irrelevant but it is not as central as it once was. (p. 201)

One may wonder why he calls this a foundational issue, as opposed to, say, a technical one. I will assume he means what he says and attempt to extract his meaning by looking through a foundational lens.

Let us examine the urgency of reconciling the need to make methods assumption-free and that of making them work in complex high dimensions. The problem of assumptions of course arises when they are made about unknowns that can introduce threats of error and/or misuse of methods.

Wasserman:These days, statisticians often deal with complex, high dimensional datasets. Researchers in statistics and machine learning have responded by creating many new methods … . However, many of these new methods depend on strong assumptions. The challenge of bringing low assumption inference to high dimensional settings requires new ways to think about the foundations of statistics. (p. 201)

It is not clear if Wasserman thinks these new methods run into trouble as a result of unwarranted assumptions. This is a substantive issue about Wasserman’s applications that foundational discussions are unlikely to answer. Still, he sees the issue as one of foundations, so I shall take him at his word.

The last decade or more has also given rise to many new problem areas that call for novel methods (e.g., machine learning). Do they call for new foundations? Or, can existing foundations be relevant here too? (See Larry Wasserman’s contribution.) A lack of clarity on the foundations of existing methods tends to leave these new domains in foundational limbo. (Mayo 2011, 92)

I may seem to be at odds with Wasserman’s call to move on past frequentist-Bayesian debates:

Debates over the philosophical foundations of statistics have a long and fascinating history; the decline of a lively exchange between philosophers of science and statisticians is relatively recent. Is there something special about 2011 (and beyond) that calls for renewed engagement in these fields? I say yes. (Mayo, p. 80)

Perhaps this may be Wasserman’s meaning: new types of problems and methods call for a more pragmatic perspective on learning from data. One cannot begin at the point at which different interpretations of probability (Bayesian or frequentist) enter; so frequentist-Bayesian debates are not as central to current practice.

I would never claim there is any obstacle to practice in not having a clear statistical philosophy. But that is different from maintaining both that practice calls for recognition of underlying foundational issues, while also denying Bayesian-frequentist issues are especially important to them. The fact is, key underlying issues come to the surface and are illuminated within frequentist-Bayesian contrasts, as are issues surrounding objective/subjective, deduction/induction, and truth/idealizations, deliberately discussed on this blog. It may be insisted we are beyond them, but they invariably lurk in the background, they are the elephants in the room.

We deliberately used ‘statistical science’ in our forum title because it may be understood broadly to include the full gamut of statistical methods, from experimental design, generation, analysis, and modeling of data to using statistical inference to answer scientific questions. (Even more broadly, we might include a variety of formal but nonprobabilistic methods in computer science and engineering, as well as machine learning.) (Mayo, p. 85)

**B. Models Are Always Wrong**

Wasserman:One then looks for adequate models rather than true models… . [A] distribution P is an adequate approximation for x1,…, xn, if typical data sets of size n, generated under P ‘look like’ x1,…, xn. (p. 203)

The recognition that “the model is always wrong”–in the sense of being an idealization– was clear to the founders of “classical” statistics*(see relevant remarks from Cox, Fisher, and Neyman elsewhere on this blog). Although this recognition discredits the idea that inference is all about assigning degrees of belief or confirmation to hypotheses and models, it supports the use of probability in standard error statistics—or so I argue. One can learn true things from idealized models.

Wasserman:A more extreme example of using weak assumptions is to abandon probability completely… . Why are scholars in foundations ignoring this? (pp. 203-4)

By and large, the idea that data were literally “generated from a distribution is usually a fiction” (p. 203) is also not news to error statisticians; in a sense, observations are always deterministic. Viewing the sample as if it were generated probabilistically may simply be to cope with incomplete information, and the incorrect inferences that can result. Probability is introduced as attached to *methods* (which, in this example, would be for a type of prediction or classification tool).

The machine learners say that there is little need to understand what actually produced the numbers. Fine, then methods are apt that enable an increasingly successful error-rate reduction. Under error statistics’ big umbrella, machine learning appears to fall under the subset of the philosophy of “inductive behavior,” the goals of which involve controlling/improving performance and setting bounds for error rates, and trading off precision and accuracy where appropriate to the particular case. This is in contrast to the subset that is the main focus of my work: that which uses error rates to assess and control how severely claims have passed tests. The latter are contexts of scientific inference. In the prediction-classification example, however, the error-rate guarantees are just the ticket. (I would not rule out inferences about the case at hand.) Yet in the domains of both inductive behavior and scientific inference, the error statistician regards models as approximations and idealizations, or, as Neyman saw them, “picturesque” ways of talking about actual experiments.

Wasserman has proved many intriguing results about the problems of and prospects for low-assumption methods. Whether methods that invoke assumptions could do better, perhaps along side these (checking or making allowances later), is not something on which I can speculate. As complex as the classification prediction problems are, they enjoy an outcome that’s normally absent: we get to find out if we’ve been successful. Background knowledge enters in qualitative ways, not obviously as prior probability distributions in parameters.

**C. Is It Bayesian?**

Wasserman:In principle, low assumption Bayesian inference is possible. We simply put a priorπon the set of all distributions P. The rest follows from Bayes theorem. But this is clearly unsatisfactory. The resulting priors have no guarantees, except the solipsistic guarantee that the answer is consistent with the assumed prior. (p. 206) [iii]

One big reason some may turn aside from frequentist-Bayesian contrasts is that today even most Bayesians grant the importance of good performance characteristics (though their meaning may differ distinctly). The traditional idea that statistical learning is well-captured by Bayes theorem is rarely upheld (we have seen exceptions, most recently Lindley, also Kadane) [iv].

Today’s debates clearly differ from the Bayesian-frequentist debates of old. In fact, some of those same discussants of statistical philosophy, who only a decade ago were arguing for the ‘irreconcilability’ of frequentist p-values and (Bayesian) measures of evidence, are now calling for ways to ‘unify’ or ‘reconcile’ frequentist and Bayesian accounts… .(Mayo p. 82)

In some cases the nonsubjective posteriors may have good error-statistical properties of the proper frequentist sort, at least in the asymptotic long run. But then another concern arises: If the default Bayesian has merely given us technical tricks to achieve frequentist goals, as some suspect, then why consider them Bayesian (Cox 2006)? Wasserman (2008, 464) puts it bluntly: If the Bayes’ estimator has good frequency-error probabilities, then we might as well use the frequentist method. If it has bad frequency behavior then we shouldn’t use it. (The situation is even more problematic for those of us who insist on a relevant severity warrant.) (Mayo, p. 90)

Wasserman:[In other cases] the answers are usually checked against held out data. This is quite sensible but then this is Bayesian in form not substance. (p. 206)

In this context, insofar as I understand it, the goal is to be able to assess how well the rule can predict “test sets” and indicate an estimate of prediction error. The substance is of an error-statistical kind: through various strategies (e.g., cross validation) we may learn approximately how well a predictive model will perform in cases other than those already used to fit the model. It connects with a general set of strategies for preventing too-easy fits and avoiding (pejorative) double-counting, “over fitting,” and nongeneralizable results.

**Deconstructing Wasserman**

So where does this leave us in deconstructing Wasserman’s call for new-fangled foundations?

Franken deconstructed: Let us imagine Franken as representing a frequentist error statistician[v]. He begins by noting that while Bayesians may detect a frequentist bias (in certain circles), he detects no such thing. Besides, such a quibble would be akin to worrying about Al-Qaeda using too much oil in their hummus!

Frequentists, he says, are at least trying to meet a fundamental scientific requirement for controlling error, and are open to any number of ways of accomplishing this. But Bayesians—at least dyed-in-the-wool (or staunch subjective or “philosophical”) Bayesians—have an agenda, Franken is saying, by analogy. They charge frequentists with legitimating a hodgepodge of “incoherent” and “inadmissible” methods; they say that frequentists care only for low error rates in the long run, have no way of incorporating background information, invariably misinterpret their own methods, and top it all off with a litany of howlers (that the Bayesian easily avoids). If the discourse on frequentist foundations seems biased, our frequentist Franken continues, it is only to correct the many blatant misinterpretations of its methods.

**********************************

Now Wasserman comes in and utters the scientific equivalent of “Let’s move on.” (as with the Clinton scandal, which gave rise to MoveOn.org, i.e., “Get over it.”) The Bayesian requirements and philosophy do not underwrite the substance of the most promising new complex methods. So if our focus is to justify, interpret, and extend these new contexts, we are allowed to leave the old (frequentist-Bayesian) scandals behind. But, as Wasserman seems further to imply, finding oneself in an essentially frequentist, error-statistical world is not enough either, especially when it comes to the kinds of complex classification and prediction problems of machine learning, data mining, and the like. At any rate, new foundational concerns must loom large….

**********************************

So let me inject myself into the interpretive mix I’ve created.

I concur with the deconstructed Franken and Wasserman. Taking seriously Wasserman’s intimation that there is not only a technical-statistical problem here (which only statisticians can solve), but also a foundational problem, he seeks a ground for applications where probabilistic bounds, however, crude, do not directly describe a data-generating mechanism, but assess/reduce/balance procedural error rates.

The “long-run” relative frequencies have probabilistic implications for bounding the next test set. The old accusation that good error-statistical properties are irrelevant to the case at hand goes by the wayside. Anyone who takes a broad view of error-statistical methods would have no problem finding a home for the variety of methods of creative control and assessment of approximate sampling distributions and error rates. This falls more clearly under what may be called “a behavioristic” context than one of scientific inference (though the latter is not precluded) . It would require breaking out of traditional notions of frequentist statistics and in so doing simultaneously scotch the oft repeated howlers.[vi]

Ironically many seem prepared to allow that Bayesianism still gets it right for epistemology, even as statistical practice calls for methods more closely aligned with frequentist principles. What I would like the reader to consider is that what is right for epistemology is also what is right for statistical learning in practice. That is, statistical inference in practice deserves its own epistemology. (Mayo, p. 100)

Constructing such a framework, would be one payoff of genuinely transcending the frequentist-Bayesian debates, rather than rendering them taboo, or closed.

Cox, D. R. 2006, *Principles of Statistical Inference*, Cambridge: Cambridge University Press.

Gelman, A and C. Shalizi. (Article first published online: 24 FEB 2012). “Philosophy and the Practice of Bayesian statistics (with discussion)”. *British Journal of Mathematical and Statistical Psychology*

*Mayo, D. (2011), “Statistical Science and Philosophy of Science: Where Do/Should They Meet in 2011 (and Beyond)?” RMM Vol. 2, 2011, 79–102 *

Wasserman, L. 2008., “Comment on article by Gelman,” Bayesian Analysis. **3**(3): 463-465.

*7/29 I modified this assertion, and will explicate the different senses in which Neyman and Pearson viewed the relationship between approximate models and correct/incorrect claims about the world later on.

[a] This *had* been an open “U-Phil”. If you send me a new analysis, I’m willing to post it.

[i] See an earlier post for the way we are using “deconstructing” here.

[ii] Says Franken: “And what shocked me most…was the silence from those conservatives who complain about the ugliness of political discourse in this country.” (19) Oh *pleeeze* (to use Franken’s expression).

[iii] For some examples of methods applicable to large numbers of variables in econometrics under the error statistical umbrella, see the two contributions to the special topic by Aris Spanos, and David Hendry. It would be interesting to hear of relationships.

[iv] Even where Bayesian methods are usefully applied, some say “most of the standard philosophy of Bayes is wrong” (Gelman and Shalizi 2012, 2 n2). See https://errorstatistics.com/2012/06/19/the-error-statistical-philosophy-and-the-practice-of-bayesian-statistics-comments-on-gelman-and-shalizi/

[v] Never mind that, intuitively, I think, it would fit more closely to see him wearing a Bayesian hat. Please weigh in on this.

[vi] For a light treatment of the latter, see this blog’s “comedy hours:”: e.g., (0), (1) & (2).

*I think Hilary C. was right about the right-wing conspiracy at the time; hence my 2008 endorsement of the PUMAs (standing for “Political Unity My Ass”).

Some related reactions and responses to Wasserman:

*Spanos on Wasserman
*https://errorstatistics.com/2012/08/08/u-phil-aris-spanos-on-larry-wasserman/

*Hennig and Gelman on Wasserman*

https://errorstatistics.com/2012/08/10/u-phil-hennig-and-gelman-on-wasserman-2011/

*Wasserman on Spanos*

https://errorstatistics.com/2012/08/11/u-phil-wasserman-replies-to-spanos-and-hennig/

Wasserman response to Mayo’s deconstruction:

https://errorstatistics.com/2012/08/13/u-phil-concluding-the-deconstruction-wasserman-mayo/

*****************************************************************************

It’s interesting to draw out the metaphor between Bayes-Frequentist and left-wing/right-wing bias, taking seriously Wasserman’s remark (despite it’s being intended as just a funny quip): “a similar comment could be applied to the usual debates in the foundations of statistical inference.” Fleshing out the analogy with Franken’s remarks would see left-wing as going with the frequentist, whereas a more apt analogy might Bayesians ~ left-wing (recalled from discussion with Wasserman). Doubtless it’s unhelpful to make such political analogies, but since this is a blog, it’s fair game, and I’d be curious as to reader’s thoughts.

My opinions have shifted a bit.

My reference to Franken’s joke suggested that the usual philosophical

debates about the foundations of statistics were un-important, much

like the debate about media bias. I was wrong on both counts.

First, I now think Franken was wrong. CNN and network news have a

strong liberal bias, especially on economic issues. FOX has an

obvious right wing, and anti-atheist bias. (At least FOX has some

libertarians on the payroll.) And this does matter. Because people

believe what they see on TV and what they read in the NY times. Paul

Krugman’s socialist bullshit parading as economics has brainwashed

millions of Americans. So media bias is much more than, who makes

better hummus.

Similarly, the Bayes-Frequentist debate still matters. And people —

including many statisticians — are still confused about the

distinction. I thought the basic Bayes-Frequentist debate was behind

us. A year and a half of blogging (as well as reading other blogs)

convinced me I was wrong here too. And this still does matter.

My emphasis on high-dimensional models is germane, however. In our

world of high-dimensional, complex models I can’t see how anyone can

interpret the output of a Bayesian analysis in any meaningful way.

I wish people were clearer about what Bayes is/is not and what

frequentist inference is/is not. Bayes is the analysis of subjective

beliefs but provides no frequency guarantees. Frequentist inference

is about making procedures that have frequency guarantees but makes no

pretense of representing anyone’s beliefs. In the high dimensional

world, you have to choose: objective frequency guarantees or

subjective beliefs. Choose whichever you prefer, but you can’t have

both. I don’t care which one people pick; I just wish they would be

clear about what they are giving up when they make their choice.

In your blog, Deborah, you mentioned these papers

http://arxiv.org/abs/1308.6306

http://arxiv.org/abs/1304.6772

by Houman Owhadi, Clint Scovel and Tim Sullivan.

And then there is this paper

http://arxiv.org/abs/1306.4943

by Gordon Belot.

These challenges to Bayesian inference remain unanswered in my

opinion. In fact, I think Freedman’s Theorem (1965, Annals, p 454)

still remains adequately unanswered.

Of course, one can embrace objective Bayesian inference. If this

means “Bayesian procedures with good frequentist properties” then I

am all for it. But this is just frequentist inference in Bayesian

clothing. If it means something else, I’d like to know what.

Normal Deviate: I’m so glad to have your update (I’ve placed it as a blogpost : https://errorstatistics.com/2013/12/28/wasserman-on-wasserman-update-december-28-2013/) I agree with everything you say, although I’d extend the point to any dimensional models. I’m commenting more on the current post, and hope others will bring forward their comments–be they on politics or statistics.

Say it ain’t so, Joe (uh, Larry)! I, for one, will definitely miss Wasserman’s blog, as I tend to align fairly closely with his views.

I don’t know about the whole left/right analogy as my own views in this regard have shifted lately (regrettably, I think that Bayesians may in fact be closer with the current “left”, but I wouldn’t align frequentists with the current “right”… Maybe more with true conservatives of old). Anyway, I have always thought that Bayesians would have to be more likely to believe in a god, or fate, or whatever.

Mark: Note, I’ve placed Wasserman’s update as a new blogpost.

Yes, the political left-right doesn’t really match the perspectives here, but I get your point about “true conservatives” (though I surprise myself). It’s an interesting thought that one might align Bayesians with “believers”, as this is often coupled with (philosophical) skepticism, positivism, operationism, constructive empiricism, and such. Maybe the cultural anthropologists have categories that fit (as they usually claim to).

“… they have an agenda…We have to be vigilant. And we have to be more than vigilant. We have to fight back… .”(Franken)

To pursue the analogy of Franken as frequentist, are you saying frequents have to be vigilant…have to fight back?

The machine learning people claim that they don’t believe in the data being generated by sampling from a distribution, i.e. a “population.” Yet, as you point out, they ask whether the model they fit does well on “new cases.” If their fitted model has any hope of doing well on the new cases, both the original data and the new cases have to come from the same population! At least in the sense of conditional distributions.

I do agree that the population is typically only conceptual. We can think of a medical study, for instance, as continuing into the infinite future, accumulating data on more and more patients as they enter the study. So I have no problem with the d and the second i in “iid.” The first i is rather tricky, though, unclear what it really means in practice.

And of COURSE the mainstream media has a liberal bias. I say that as a liberal myself. These are people who only talk to each other, and are doing well (or think they are) under the present government policies and thus think everyone else is too. They easily accept false platitudes that even a bright 10-year-old could see through. CNN is just as biased as Fox, and the New York Times has become a shadow of its former self in terms of fair, probing journalism.

Norm: so are youagreeing or disagreeing with the machine learners who deny data generating mechanisms? The “mechanism” or just “source” of data needn’t be “out there” it could be me or we who generate, and it also could be merely an instrument for prediction, as instrumentalists would say.

No, I am disagreeing with the machine learning people. If they want to think in terms of “mechanisms,” fine, but my point is that they then completely contradict themselves by claiming that their results extrapolate to new data. Ditto for their comments about overfitting etc.

Norm: I take it they want to say they can predict any number of things, e.g., whether you’ll buy book X, like a person on a dating site, based on lots of correlations without a clue, or with limited clues, as to causes, and without understanding why. Of course, without even minimal understanding, things can easily go wrong. But I’m interested to hear you’re not a fan of the machine learning-black box way of thinking. You don’t think it “works” for various cases? I distinguish it from scientific understanding, but I don’t rule out a mixture.