Instance Weighting for Domain Adaptation in NLP

Instance Weighting for Domain Adaptation in NLP

June 2007 | Jing Jiang and ChengXiang Zhai
This paper addresses the domain adaptation problem in natural language processing (NLP), where labeled data is scarce in new domains. The authors propose a general instance weighting framework for domain adaptation, which incorporates and exploits more information from the target domain. The domain adaptation problem is analyzed from a distributional perspective, revealing two distinct needs for adaptation: labeling adaptation (where the classification function differs between domains) and instance adaptation (where the instance distribution differs between domains). The proposed framework uses instance weighting to address these two types of adaptation. It includes three adaptation heuristics: (1) removing misleading training instances in the source domain, (2) assigning higher weights to labeled target instances than labeled source instances, and (3) augmenting training instances with target instances that have predicted labels. The framework is evaluated on three NLP tasks: part-of-speech (POS) tagging, named entity (NE) type classification, and spam filtering. The results show that the proposed method outperforms regular semi-supervised and supervised learning methods, as it explicitly captures domain differences. The framework is flexible and can be applied to both scenarios with and without labeled target instances. The authors conclude that incorporating more information from the target domain is more effective than excluding misleading examples from the source domain.This paper addresses the domain adaptation problem in natural language processing (NLP), where labeled data is scarce in new domains. The authors propose a general instance weighting framework for domain adaptation, which incorporates and exploits more information from the target domain. The domain adaptation problem is analyzed from a distributional perspective, revealing two distinct needs for adaptation: labeling adaptation (where the classification function differs between domains) and instance adaptation (where the instance distribution differs between domains). The proposed framework uses instance weighting to address these two types of adaptation. It includes three adaptation heuristics: (1) removing misleading training instances in the source domain, (2) assigning higher weights to labeled target instances than labeled source instances, and (3) augmenting training instances with target instances that have predicted labels. The framework is evaluated on three NLP tasks: part-of-speech (POS) tagging, named entity (NE) type classification, and spam filtering. The results show that the proposed method outperforms regular semi-supervised and supervised learning methods, as it explicitly captures domain differences. The framework is flexible and can be applied to both scenarios with and without labeled target instances. The authors conclude that incorporating more information from the target domain is more effective than excluding misleading examples from the source domain.
Reach us at info@futurestudyspace.com