Posts Tagged With: Occam’s Razor

Vladimir Cherkassky Responds on Foundations of Simplicity

I thank Dr. Vladimir Cherkassky for taking up my general invitation to comment. I don’t have much to add to my original post[i], except to make two corrections at the end of this post.  I invite readers’ comments.

Vladimir Cherkassky

As I could not participate in the discussion session on Sunday, I would like to address several technical issues and points of disagreement that became evident during this workshop. All opinions are mine, and may not be representative of the “machine learning community.” Unfortunately, the machine learning community at large is not very much interested in the philosophical and methodological issues. This breeds a lot of fragmentation and confusion, as evidenced by the existence of several technical fields: machine learning, statistics, data mining, artificial neural networks, computational intelligence, etc.—all of which are mainly concerned with the same problem of estimating good predictive models from data.

Occam’s Razor (OR) is a general metaphor in the philosophy of science, and it has been discussed for ages. One of the main goals of this workshop was to understand the role of OR as a general inductive principle in the philosophy of science and, in particular, its importance in data-analytic knowledge discovery for statistics and machine learning.

Data-analytic modeling is concerned with estimating good predictive models from finite data samples. This is directly related to the philosophical problem of inductive inference. The problem of learning (generalization) from finite data had been formally investigated in VC-theory ~ 40 years ago. This theory starts with a mathematical formulation of the problem of learning from finite samples, without making any assumptions about parametric distributions. This formalization is very general and relevant to many applications in machine learning, statistics, life sciences, etc. Further, this theory provides necessary and sufficient conditions for generalization. That is, a set of admissible models (hypotheses about the data) should be constrained, i.e., should have finite VC-dimension. Therefore, any inductive theory or algorithm designed to explain the data should satisfy VC-theoretical conditions. Continue reading

Categories: philosophy of science, Statistics | Tags: , , , , , | 12 Comments

More from the Foundations of Simplicity Workshop*

*See also earlier posts from the CMU workshop here and here.

Elliott Sober has been writing on simplicity for a long time, so it was good to hear his latest thinking. If I understood him, he continues to endorse a comparative likelihoodist account, but he allows that, in model selection, “parsimony fights likelihood,” while, in adequate evolutionary theory, the two are thought to go hand in hand. Where it seems needed, therefore, he accepts a kind of “pluralism”. His discussion of the rival models in evolutionary theory and how they may give rise to competing likelihoods (for “tree taxonomies”) bears examination in its own right, but being in no position to accomplish this, I shall limit my remarks to the applicability of Sober’s insights (as my notes reflect them) to the philosophy of statistics and statistical evidence.

1. Comparativism:  We can agree that a hypothesis is not appraised in isolation, but to say that appraisal is “contrastive” or “comparativist” is ambiguous. Error statisticians view hypothesis testing as between exhaustive hypotheses H and not-H (usually within a model), but deny that the most that can be said is that one hypothesis or model is comparatively better than another, among a group of hypotheses that is to be delineated at the outset. There’s an important difference here. The best-tested of the lot need not be well-tested!

2. Falsification: Sober made a point of saying that his account does not falsify models or hypotheses. We are to start out with all the possible models to be considered (hopefully including one that is true or approximately true), akin to the “closed universe” of standard Bayesian accounts[i], but do we not get rid of any as falsified, given data? It seems not.

Continue reading

Categories: philosophy of science, Statistics | Tags: , , , , , , | 3 Comments

Deviates, Sloths, and Exiles: Philosophical Remarks on the Ockham’s Razor Workshop*

Picking up the pieces…

My flight out of Pittsburgh has been cancelled, and as I may be stuck in the airport for some time, I will try to make a virtue of it by jotting down some of my promised reflections on the “simplicity and truth” conference at Carnegie Mellon (organized by Kevin Kelly). My remarks concern only the explicit philosophical connections drawn by (4 of) the seven non-philosophers who spoke. For more general remarks, see blogs of: Larry Wasserman (Normal Deviate) and Cosma Shalizi (Three-Toed Sloth). (The following, based on my notes and memory, may include errors/gaps, but I trust that my fellow bloggers and sloggers, will correct me.)

First to speak were Vladimir Vapnik and Vladimir Cherkassky, from the field of machine learning, a discipline I know of only formally. Vapnik, of the Vapnik Chervonenkis (VC) theory, is known for his seminal work here. Their papers, both of which addressed directly the philosophical implications of their work, share enough themes to merit being taken up together.

Vapnik and Cherkassky find a number of striking dichotomies in the standard practice of both philosophy and statistics. They contrast the “classical” conception of scientific knowledge as essentially rational with the more modern, “data-driven” empirical view:

The former depicts knowledge as objective, deterministic, rational. Ockham’s razor is a kind of synthetic a priori statement that warrants our rational intuitions as the foundation of truth with a capital T, as well as a naïve realism (we may rely on Cartesian “clear and distinct” ideas; God does not deceive; and so on). The latter empirical view, illustrated by machine learning, is enlightened. It settles for predictive successes and instrumentalism, views models as mental constructs (in here, not out there), and exhorts scientists to restrict themselves to problems deemed “well posed” by machine-learning criteria.

But why suppose the choice is between assuming “a single best (true) theory or model” and the extreme empiricism of their instrumental machine learner? Continue reading

Categories: philosophy of science, Statistics | Tags: , , , , | 14 Comments

Blog at WordPress.com.