Posts Tagged With: Vladimir Cherkassky

Peter Grünwald: Follow-up on Cherkassky’s Comments

Peter Grünwald

Peter Grünwald

A comment from Professor Peter Grünwald

Head, Information-theoretic Learning Group, Centrum voor Wiskunde en Informatica (CWI)
Part-time full professor  at Leiden University.

This is a follow-up on Vladimir Cherkassky’s comments on Deborah’s blog. First of all let me thank Vladimir for taking the time to clarify his position. Still, there’s one issue where we disagree and which, at the same time, I think, needs clarification, so I decided to write this follow-up.[related posts 1]

The issue is about how central VC (Vapnik-Chervonenkis)-theory is to inductive inference.

I agree with Vladimir that VC-theory is one of the most important achievements in the field ever, and indeed, that it fundamentally changed our way of thinking about learning from data. Yet I also think that there are many problems of inductive inference to which it has no direct bearing. Some of these are concerned with hypothesis testing, but even when one is concerned with prediction accuracy – which Vladimir considers the basic goal – there are situations where I do not see how it plays a direct role. One of these is sequential prediction with log-loss or its generalization, Cover’s loss. This loss function plays a fundamental role in (1) language modeling, (2) on-line data compression, (3a) gambling and (3b) sequential investment on the stock market (here we need Cover’s loss). [a superquick intro to log-loss as well as some references are given below under [A]; see also my talk at the Ockham workshop (slides 16-26 about weather forecasting!) )

Continue reading

Categories: philosophy of science, Statistics | Tags: , , , , , , | 16 Comments

Vladimir Cherkassky Responds on Foundations of Simplicity

I thank Dr. Vladimir Cherkassky for taking up my general invitation to comment. I don’t have much to add to my original post[i], except to make two corrections at the end of this post.  I invite readers’ comments.

Vladimir Cherkassky

As I could not participate in the discussion session on Sunday, I would like to address several technical issues and points of disagreement that became evident during this workshop. All opinions are mine, and may not be representative of the “machine learning community.” Unfortunately, the machine learning community at large is not very much interested in the philosophical and methodological issues. This breeds a lot of fragmentation and confusion, as evidenced by the existence of several technical fields: machine learning, statistics, data mining, artificial neural networks, computational intelligence, etc.—all of which are mainly concerned with the same problem of estimating good predictive models from data.

Occam’s Razor (OR) is a general metaphor in the philosophy of science, and it has been discussed for ages. One of the main goals of this workshop was to understand the role of OR as a general inductive principle in the philosophy of science and, in particular, its importance in data-analytic knowledge discovery for statistics and machine learning.

Data-analytic modeling is concerned with estimating good predictive models from finite data samples. This is directly related to the philosophical problem of inductive inference. The problem of learning (generalization) from finite data had been formally investigated in VC-theory ~ 40 years ago. This theory starts with a mathematical formulation of the problem of learning from finite samples, without making any assumptions about parametric distributions. This formalization is very general and relevant to many applications in machine learning, statistics, life sciences, etc. Further, this theory provides necessary and sufficient conditions for generalization. That is, a set of admissible models (hypotheses about the data) should be constrained, i.e., should have finite VC-dimension. Therefore, any inductive theory or algorithm designed to explain the data should satisfy VC-theoretical conditions. Continue reading

Categories: philosophy of science, Statistics | Tags: , , , , , | 12 Comments

Deviates, Sloths, and Exiles: Philosophical Remarks on the Ockham’s Razor Workshop*

Picking up the pieces…

My flight out of Pittsburgh has been cancelled, and as I may be stuck in the airport for some time, I will try to make a virtue of it by jotting down some of my promised reflections on the “simplicity and truth” conference at Carnegie Mellon (organized by Kevin Kelly). My remarks concern only the explicit philosophical connections drawn by (4 of) the seven non-philosophers who spoke. For more general remarks, see blogs of: Larry Wasserman (Normal Deviate) and Cosma Shalizi (Three-Toed Sloth). (The following, based on my notes and memory, may include errors/gaps, but I trust that my fellow bloggers and sloggers, will correct me.)

First to speak were Vladimir Vapnik and Vladimir Cherkassky, from the field of machine learning, a discipline I know of only formally. Vapnik, of the Vapnik Chervonenkis (VC) theory, is known for his seminal work here. Their papers, both of which addressed directly the philosophical implications of their work, share enough themes to merit being taken up together.

Vapnik and Cherkassky find a number of striking dichotomies in the standard practice of both philosophy and statistics. They contrast the “classical” conception of scientific knowledge as essentially rational with the more modern, “data-driven” empirical view:

The former depicts knowledge as objective, deterministic, rational. Ockham’s razor is a kind of synthetic a priori statement that warrants our rational intuitions as the foundation of truth with a capital T, as well as a naïve realism (we may rely on Cartesian “clear and distinct” ideas; God does not deceive; and so on). The latter empirical view, illustrated by machine learning, is enlightened. It settles for predictive successes and instrumentalism, views models as mental constructs (in here, not out there), and exhorts scientists to restrict themselves to problems deemed “well posed” by machine-learning criteria.

But why suppose the choice is between assuming “a single best (true) theory or model” and the extreme empiricism of their instrumental machine learner? Continue reading

Categories: philosophy of science, Statistics | Tags: , , , , | 14 Comments

Blog at WordPress.com.