11-16 July 2010 | Joseph Turian, Lev Ratnov, Yoshua Bengio
This paper presents a simple and general method for improving the accuracy of existing supervised NLP systems by incorporating unsupervised word representations as additional word features. The authors evaluate three types of word representations—Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeddings—on the tasks of named entity recognition (NER) and chunking. They find that each of these representations improves the accuracy of existing supervised baselines, and further improvements are achieved by combining different word representations. The authors also provide their word features and code for use in existing NLP systems.
The paper discusses various approaches to word representation, including distributional representations based on co-occurrence matrices and clustering-based representations. It also explores neural language models that induce dense, low-dimensional word embeddings. The authors compare different techniques for inducing word representations and evaluate them on NER and chunking tasks.
The paper also addresses the issue of data sparsity in supervised NLP systems and proposes using unsupervised word representations to mitigate this problem. The authors find that Brown clusters and word embeddings both improve the accuracy of a near-state-of-the-art supervised NLP system, and that combining different word representations can further improve accuracy. Error analysis indicates that Brown clustering induces better representations for rare words than C&W embeddings that have not received many training updates.
The authors also propose a default method for setting the scaling parameter for word embeddings, allowing them to be used off-the-shelf as word features without tuning. They conclude that unsupervised word representations can be learned in an unsupervised, task-agnostic manner and are easily integrated into existing supervised NLP systems. However, they note that accuracy may not be as high as a semi-supervised method that includes task-specific information and jointly learns the supervised and unsupervised tasks.This paper presents a simple and general method for improving the accuracy of existing supervised NLP systems by incorporating unsupervised word representations as additional word features. The authors evaluate three types of word representations—Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeddings—on the tasks of named entity recognition (NER) and chunking. They find that each of these representations improves the accuracy of existing supervised baselines, and further improvements are achieved by combining different word representations. The authors also provide their word features and code for use in existing NLP systems.
The paper discusses various approaches to word representation, including distributional representations based on co-occurrence matrices and clustering-based representations. It also explores neural language models that induce dense, low-dimensional word embeddings. The authors compare different techniques for inducing word representations and evaluate them on NER and chunking tasks.
The paper also addresses the issue of data sparsity in supervised NLP systems and proposes using unsupervised word representations to mitigate this problem. The authors find that Brown clusters and word embeddings both improve the accuracy of a near-state-of-the-art supervised NLP system, and that combining different word representations can further improve accuracy. Error analysis indicates that Brown clustering induces better representations for rare words than C&W embeddings that have not received many training updates.
The authors also propose a default method for setting the scaling parameter for word embeddings, allowing them to be used off-the-shelf as word features without tuning. They conclude that unsupervised word representations can be learned in an unsupervised, task-agnostic manner and are easily integrated into existing supervised NLP systems. However, they note that accuracy may not be as high as a semi-supervised method that includes task-specific information and jointly learns the supervised and unsupervised tasks.