June 2007 | John Blitzer, Mark Dredze, Fernando Pereira
The paper "Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification" by John Blitzer, Mark Dredze, and Fernando Pereira from the University of Pennsylvania explores the challenges and solutions for domain adaptation in sentiment classification. The authors focus on adapting sentiment classifiers to different types of product reviews, such as books, DVDs, electronics, and kitchen appliances, which are collected from Amazon. They extend the Structural Correspondence Learning (SCL) algorithm to improve domain adaptation and introduce a method to select pivot features based on mutual information with source labels. This approach reduces the relative error due to adaptation by 30% compared to the original SCL algorithm and 46% compared to a supervised baseline. Additionally, they propose the $\mathcal{A}$-distance measure to evaluate domain similarity, which correlates well with the potential for adaptation. The $\mathcal{A}$-distance can be used to select a subset of domains to annotate, ensuring that the trained classifiers will transfer well to other domains. The paper also discusses the correction of feature misalignments using a small amount of labeled target domain data and provides experimental results to support their findings.The paper "Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification" by John Blitzer, Mark Dredze, and Fernando Pereira from the University of Pennsylvania explores the challenges and solutions for domain adaptation in sentiment classification. The authors focus on adapting sentiment classifiers to different types of product reviews, such as books, DVDs, electronics, and kitchen appliances, which are collected from Amazon. They extend the Structural Correspondence Learning (SCL) algorithm to improve domain adaptation and introduce a method to select pivot features based on mutual information with source labels. This approach reduces the relative error due to adaptation by 30% compared to the original SCL algorithm and 46% compared to a supervised baseline. Additionally, they propose the $\mathcal{A}$-distance measure to evaluate domain similarity, which correlates well with the potential for adaptation. The $\mathcal{A}$-distance can be used to select a subset of domains to annotate, ensuring that the trained classifiers will transfer well to other domains. The paper also discusses the correction of feature misalignments using a small amount of labeled target domain data and provides experimental results to support their findings.