- Nearest neighbor methods (covered by the final assignment)
- Prediction bounds (the last lecture before the midterm)
- Large-scale learning (exponentiated gradient, VW), but the standard gradient method may be covered.

## Thursday, April 17, 2008

### Final

## Wednesday, April 16, 2008

### Reading Assignments

1) Please email me IF you want to do paper #2. (As I recall, nobody expressed interest in reading it, so unless I hear from someone, the paper is off the list.)

2) Please email me IF there is any time conflict.

3) Please don't email me asking what type of questions there will be on the quiz. You should understand the papers in depth. You CAN use the papers on the quiz, but you can't use any other material.

## Saturday, April 5, 2008

### Questions about final projects

- in Problem #1, do we have to come up with a new method, or can we use

an already existing one (I was thinking about Kmeans, for example) ?

You can use an existing method, provided that you are happy with how it performs on the problem. The webpage describing the dataset has a list of test error rates achievable by different methods, so you can see how well you are doing.

how about the code? do we have to provide it as well? does it have to be

100% original or can we use (and maybe adapt) toolboxes? (I saw for

example you pointed us to weka). Are there restrictions on the language?

You have to provide the code. You can use toolboxes. The code doesn't have to be original if you can make it work well on the problem. The grade *will* depend on the performance of other students. There is no restriction on the programming language as long as you make it easy for me to run your solution to verify the reported test error rates. (Again, if you somehow use the test set to tune your solution, you will automatically get 0 points.)

- If we turn in some projects before May 1st, will they be graded

earlier (so that we get an idea whether we should attempt others) ?

Yes, but every student will be given only one additional attempt.

An important note: If you choose to do a reading assignment with a quiz, I will subtract points if you clearly don't understand an important concept from the paper. So choose this option only if you are serious about it.

## Wednesday, April 2, 2008

### Nearest Neighbor Methods

Isomap

Locally Linear Embedding (LLE)

Maximum variance unfolding

Distance Metric Learning for Large Margin

Nearest Neighbor Classification

Nearest neighbor schemes:

Piotr Indyk's tutorial

Ken Clarkson's tutorial

Cover trees

Locality-Sensitive Hashing (LSH)

ANN: A Library for Approximate Nearest Neighbor Searching

### Test set bound

## Tuesday, April 1, 2008

### Large Scale Learning

A link to Hadoop, also here and here.

## Friday, March 28, 2008

### Active Learning

S. Dasgupta, D.J. Hsu, and C. Monteleoni. A general agnostic active learning algorithm.

*Neural Information Processing Systems (NIPS)*, 2007.

S. Dasgupta. Coarse sample complexity bounds for active learning.* Neural Information Processing Systems (NIPS)*, 2005.

S. Dasgupta, A. Kalai, and C. Monteleoni. Analysis of perceptron-based active learning.*Eighteenth Annual Conference on Learning Theory (COLT)*, 2005.

The algorithm A^{2} described in the class:

M-F. Balcan, A. Beygelzimer, and J. Langford. Agnostic Active Learning, ICML 2006.

Video lecture on A^{2}.

Hanneke, S. (2007). Teaching Dimension and the Complexity of Active Learning. In proceedings of the 20^{th} Annual Conference on Learning Theory (COLT).

Hanneke, S. (2007). A Bound on the Label Complexity of Agnostic Active Learning. In proceedings of the 24^{th} Annual International Conference on Machine Learning (ICML).

M-F. Balcan, A. Broder, and T. Zhang. Margin Based Active Learning, COLT 2007.

## Friday, March 14, 2008

### Midterm statistics

## Saturday, March 8, 2008

### Midterm

There will be three problems (8 points each) and three simple exercises (3 points each). Thus if you solve any two problems + all exercises, you will still get the maximum grade. If you do everything, you will get 8 bonus points.

The material covered in the following lectures will NOT be on the midterm:

- Generalization bounds (the lecture before the midterm)
- Reductions
- Sanjoy's talk on projections, Gaussian scale mixtures, and the EM

The other two problems will be on any two of the following algorithms:

- Winnow
- Halving
- (Randomized) Weighted Majority
- Perceptron

## Monday, February 4, 2008

### Conditional means, medians and all that

In many situations, the mean is not sufficiently robust, and one is actually interested in the conditional median (see this book). The conditional median is formally defined by median(x) = {q in R : D(Y ≤ q | X = x) ≥ 1/2 and D (Y ≥ q | X = x) ≥ 1/2}. The median is also the solution to the problem of minimizing E|f(x)-y|, where the expectation is with respect to D. Note that the median may not be unique when the conditional distribution has regions with zero mass.

The argument goes like this: Fix x. Consider two equal point masses (with respect to D|x) at locations y1 and y2 in R. The loss suffered by any prediction y in the interval [y1,y2] is (y-y1)+(y2-y) = (y2-y1). Any y not in [y1,y2] always induces a larger loss, so the minimizer is in [y1,y2]. Since we can break D|x and break it into equal mass pairs with y2 above and y1 below the median (by the definition of the median), the absolute error loss is minimized when f(x) = median(x). A similar argument holds for discrete random variables (special case).

### Reductions

- John Langford's reductions page (including a link to John's video lectures at different MLSS)
- Eun Bae Kong and Thomas Dietterich. Error-Correcting Output Coding corrects bias and variance, ICML 95.
- Amit Sahai and Venkat Guruswami. Multiclass Learning, Boosting, and Error-correcting codes, COLT 1999.
- See Yoram Singer's papers on multiclass learning, including the two below.
- Erin Allwein, Robert Schapire, Yoram Singer. Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers, Journal of Machine Learning Research, 2000
- Koby Crammer and Yoram Singer. On the Learnability and Design of Output Codes for Multiclass Problems, Machine Learning.
- Robert E. Schapire.
**Using output codes to boost multiclass learning problems**, ICML 97.