Elliott Sober has been writing on simplicity for a long time, so it was good to hear his latest thinking. If I understood him, he continues to endorse a comparative likelihoodist account, but he allows that, in model selection, “parsimony fights likelihood,” while, in adequate evolutionary theory, the two are thought to go hand in hand. Where it seems needed, therefore, he accepts a kind of “pluralism”. His discussion of the rival models in evolutionary theory and how they may give rise to competing likelihoods (for “tree taxonomies”) bears examination in its own right, but being in no position to accomplish this, I shall limit my remarks to the applicability of Sober’s insights (as my notes reflect them) to the philosophy of statistics and statistical evidence.
1. Comparativism: We can agree that a hypothesis is not appraised in isolation, but to say that appraisal is “contrastive” or “comparativist” is ambiguous. Error statisticians view hypothesis testing as between exhaustive hypotheses H and not-H (usually within a model), but deny that the most that can be said is that one hypothesis or model is comparatively better than another, among a group of hypotheses that is to be delineated at the outset. There’s an important difference here. The best-tested of the lot need not be well-tested!
2. Falsification: Sober made a point of saying that his account does not falsify models or hypotheses. We are to start out with all the possible models to be considered (hopefully including one that is true or approximately true), akin to the “closed universe” of standard Bayesian accounts[i], but do we not get rid of any as falsified, given data? It seems not.
For example, AIC rank orders models, but I take it that even the least best isn’t falsified. This seems problematic: in science, we want to find claims to be false and learn from anomalies that will not go away in order to discover brand new phenomena. Granted, deductive falsification is rarely (if ever) possible in empirical science (an exception might be “All swans are white in a finite population”), but, as Popperians have realized, that would leave out the most interesting scientific hypotheses and theories, which always have ceteris paribus clauses. Popperian “methodological falsificationism” attempts to rectify this; error statisticians succeed. A rejection of H (which is distinct from poor evidence for H) results from passing not-H with severity. I found it both interesting and surprising that Sober was prepared to reject falsification in science. (Perhaps he would add a falsification rule to the comparativism?)
3. Law of Likelihood: Restricting the analysis to a comparison between (non-exhaustive) hypotheses H’ and H” (e.g., H’ is better supported than H”) while unsatisfactory to an error statistician, enables Sober to mitigate some of the problems of likelihoodist accounts. For example, with simple versus simple statistical hypotheses, there is an upper bound to error rates that in general is absent. Still, the inability to adequately control error probabilities remains: Hypothesis H” is better supported than H’ if P(x;H”) > P(x;H’) (according to the “law of likelihood”), but it is much too easy to find a maximally likely H”, which would then be preferred to H’ –even if H’ is true[ii]. So one does not get around inferring a hypothesis H” (as comparatively more likely) even though H” had passed a test that lacks severity. But perhaps Sober’s pluralism allows him to invoke some means to avoid this; or he’s prepared to live with it.
Incidentally, these were the grounds that led Ian Hacking to reject the law of likelihood (and the theory of support) that he espoused and popularized in Hacking 1965[iii]. (See, for example, Hacking 1972: “Likelihood,” British Journal for the Philosophy of Science 23:132-137.) Admittedly, Hacking’s change of heart is much less well known than his earlier work in favor of the likelihood theory of support. He once told me that he wished more philosophers were aware of this shift.
Malcolm Forster: He noted that there are several mutually compatible accounts of simplicity (and maintained that most are right, except for the one that claims that simplicity is a sign of truth because the world is simple). But surely the familiar idea that the simplest hypothesis is the most plausible or probable (in a Bayesian sense) is at odds with the Popperian idea that the simplest that is most falsifiable. Forster explained that here he was focusing deliberately on a group of compatible accounts, but it is not clear whether this group is along Popperian, likelihoodist, Bayesian, or other lines. On the examples of Ptolemy, Copernicus, and Kepler, I refer the reader to Aris Spanos:
The fittest curve (statistically adequate) is not determined by the smallness of its residuals, tempered by simplicity or other pragmatic criteria, but by the nonsystematic (e.g. white noise) nature of its residuals. The advocated error-statistical arguments are illustrated by comparing the Kepler and Ptolemaic models on empirical grounds, showing that the former is statistically adequate but the latter is not. Indeed, the Ptolemaic model constitutes the quintessential example of “best” in a mathematical approximation sense, that gives rise to systematic (nonwhite noise) residuals, and thus, it does not “save the phenomena.” (Spanos 2007, 1047-8)
The need to distinguish adequate “fit” from a mathematical approximation perspective, as opposed to an error statistical perspective, I suspect, is also key to unraveling a related puzzle that arose at the conference: how to identify a philosophy for machine learning, if there is such a thing. I will reflect more on this in considering Wasserman’s contribution to our RMM volume.
[i] A minority group of Bayesians are prepared to falsify underlying statistical models, including priors (e.g., Gelman, see blog post discussions).
[ii] Of course they may be tied.
[iii] In addition to the lack of error control, he denies that likelihood ratios have the same evidential meanings in different contexts.