COMS 4771: February 2008

Let D be a distribution over X x Y, where X is some feature space and Y is a real-valued label. It is well known that the minimizer of the squared loss is the conditional mean mean(x) = E[y], where the expectation is over the draw of y from D|x. It's your homework problem number 3. (For those of you who are overly concerned with rigor, assume that X and Y are closed subsets of R^m and R, respectively. This is clearly true for any machine representation of X and Y.)

In many situations, the mean is not sufficiently robust, and one is actually interested in the conditional median (see this book). The conditional median is formally defined by median(x) = {q in R : D(Y ≤ q | X = x) ≥ 1/2 and D (Y ≥ q | X = x) ≥ 1/2}. The median is also the solution to the problem of minimizing E|f(x)-y|, where the expectation is with respect to D. Note that the median may not be unique when the conditional distribution has regions with zero mass.

The argument goes like this: Fix x. Consider two equal point masses (with respect to D|x) at locations y1 and y2 in R. The loss suffered by any prediction y in the interval [y1,y2] is (y-y1)+(y2-y) = (y2-y1). Any y not in [y1,y2] always induces a larger loss, so the minimizer is in [y1,y2]. Since we can break D|x and break it into equal mass pairs with y2 above and y1 below the median (by the definition of the median), the absolute error loss is minimized when f(x) = median(x). A similar argument holds for discrete random variables (special case).

John Langford's reductions page (including a link to John's video lectures at different MLSS)
Eun Bae Kong and Thomas Dietterich. Error-Correcting Output Coding corrects bias and variance, ICML 95.
Amit Sahai and Venkat Guruswami. Multiclass Learning, Boosting, and Error-correcting codes, COLT 1999.
See Yoram Singer's papers on multiclass learning, including the two below.
Erin Allwein, Robert Schapire, Yoram Singer. Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers, Journal of Machine Learning Research, 2000
Koby Crammer and Yoram Singer. On the Learnability and Design of Output Codes for Multiclass Problems, Machine Learning.
Robert E. Schapire. Using output codes to boost multiclass learning problems, ICML 97.

COMS 4771

Monday, February 4, 2008

Conditional means, medians and all that

Reductions

Blog Archive