A comment from Professor Peter Grünwald
Head, Information-theoretic Learning Group, Centrum voor Wiskunde en Informatica (CWI)
Part-time full professor at Leiden University.
This is a follow-up on Vladimir Cherkassky’s comments on Deborah’s blog. First of all let me thank Vladimir for taking the time to clarify his position. Still, there’s one issue where we disagree and which, at the same time, I think, needs clarification, so I decided to write this follow-up.[related posts 1]
The issue is about how central VC (Vapnik-Chervonenkis)-theory is to inductive inference.
I agree with Vladimir that VC-theory is one of the most important achievements in the field ever, and indeed, that it fundamentally changed our way of thinking about learning from data. Yet I also think that there are many problems of inductive inference to which it has no direct bearing. Some of these are concerned with hypothesis testing, but even when one is concerned with prediction accuracy – which Vladimir considers the basic goal – there are situations where I do not see how it plays a direct role. One of these is sequential prediction with log-loss or its generalization, Cover’s loss. This loss function plays a fundamental role in (1) language modeling, (2) on-line data compression, (3a) gambling and (3b) sequential investment on the stock market (here we need Cover’s loss). [a superquick intro to log-loss as well as some references are given below under [A]; see also my talk at the Ockham workshop (slides 16-26 about weather forecasting!) )
I thank Dr. Vladimir Cherkassky for taking up my general invitation to comment. I don’t have much to add to my original post[i], except to make two corrections at the end of this post. I invite readers’ comments.
As I could not participate in the discussion session on Sunday, I would like to address several technical issues and points of disagreement that became evident during this workshop. All opinions are mine, and may not be representative of the “machine learning community.” Unfortunately, the machine learning community at large is not very much interested in the philosophical and methodological issues. This breeds a lot of fragmentation and confusion, as evidenced by the existence of several technical fields: machine learning, statistics, data mining, artificial neural networks, computational intelligence, etc.—all of which are mainly concerned with the same problem of estimating good predictive models from data.
Occam’s Razor (OR) is a general metaphor in the philosophy of science, and it has been discussed for ages. One of the main goals of this workshop was to understand the role of OR as a general inductive principle in the philosophy of science and, in particular, its importance in data-analytic knowledge discovery for statistics and machine learning.
Data-analytic modeling is concerned with estimating good predictive models from finite data samples. This is directly related to the philosophical problem of inductive inference. The problem of learning (generalization) from finite data had been formally investigated in VC-theory ~ 40 years ago. This theory starts with a mathematical formulation of the problem of learning from finite samples, without making any assumptions about parametric distributions. This formalization is very general and relevant to many applications in machine learning, statistics, life sciences, etc. Further, this theory provides necessary and sufficient conditions for generalization. That is, a set of admissible models (hypotheses about the data) should be constrained, i.e., should have finite VC-dimension. Therefore, any inductive theory or algorithm designed to explain the data should satisfy VC-theoretical conditions. Continue reading