GSP Lecture Series

Peter Bickel University of California, Berkeley

10 January 2005
Damon Room
Lecture 1:30, refreshments 2:30

On the borders of Statistics and Computer Science

Machine learning in computer science and prediction and classification in statistics are essentially equivalent fields. I will try to illustrate the relation between theory and practice in this huge area by a few examples and results. In particular I will try to address an apparent puzzle: Worst case analyses, using empirical process theory, seem to suggest that even for moderate data dimension and reasonable sample sizes good prediction (supervised learning) should be very difficult. On the other hand, practice seems to indicate that even when the number of dimensions is very much higher than the number of observations, we can often do very well. We also discuss a new method of dimension estimation and some features of cross validation.