Understanding A theory of learning from different domains

This paper addresses the problem of domain adaptation, where a classifier trained on a source domain is applied to a target domain with a different distribution. The authors investigate two main questions: (1) under what conditions can a classifier trained from the source domain perform well on the target domain, and (2) how should a small amount of labeled target data be combined with a large amount of labeled source data to minimize the target error. They introduce a classifier-induced divergence measure, the $\mathcal{H}\Delta\mathcal{H}$-divergence, which can be estimated from finite, unlabeled samples from both domains. They bound the target error in terms of the empirical source error, the empirical $\mathcal{H}\Delta\mathcal{H}$-divergence, and the combined error of the best single hypothesis for both domains. They also derive a bound on the target error of a classifier that minimizes a convex combination of empirical source and target errors, showing that the optimal value of $\alpha$ (the weight of target data) depends on the divergence between domains, the sample sizes, and the complexity of the hypothesis class. The bound generalizes previous work and is always at least as tight as bounds considering only the target or source error or equal weighting of both. Experimental results on sentiment classification demonstrate the effectiveness of the proposed approach.This paper addresses the problem of domain adaptation, where a classifier trained on a source domain is applied to a target domain with a different distribution. The authors investigate two main questions: (1) under what conditions can a classifier trained from the source domain perform well on the target domain, and (2) how should a small amount of labeled target data be combined with a large amount of labeled source data to minimize the target error. They introduce a classifier-induced divergence measure, the $\mathcal{H}\Delta\mathcal{H}$-divergence, which can be estimated from finite, unlabeled samples from both domains. They bound the target error in terms of the empirical source error, the empirical $\mathcal{H}\Delta\mathcal{H}$-divergence, and the combined error of the best single hypothesis for both domains. They also derive a bound on the target error of a classifier that minimizes a convex combination of empirical source and target errors, showing that the optimal value of $\alpha$ (the weight of target data) depends on the divergence between domains, the sample sizes, and the complexity of the hypothesis class. The bound generalizes previous work and is always at least as tight as bounds considering only the target or source error or equal weighting of both. Experimental results on sentiment classification demonstrate the effectiveness of the proposed approach.

A theory of learning from different domains

28 February 2009 / Revised: 12 September 2009 / Accepted: 18 September 2009 / Published online: 23 October 2009 | Shai Ben-David · John Blitzer · Koby Crammer · Alex Kulesza · Fernando Pereira · Jennifer Wortman Vaughan