[slides] Unsupervised Models for Named Entity Classification

This paper discusses the use of unlabeled examples in named entity classification, a task that typically requires a large number of labeled examples. The authors propose two algorithms to leverage the redundancy in unlabeled data, which can significantly reduce the need for supervision. The first algorithm is based on decision list learning, similar to Yarowsky's method but with modifications inspired by Blum and Mitchell's work. The second algorithm, CoBoost, extends boosting algorithms to the named entity classification problem, aiming to minimize the disagreement between two classifiers on unlabeled examples. Both algorithms achieve high accuracy with only a few seed rules, demonstrating the effectiveness of using unlabeled data in this context. The paper also includes an evaluation using a dataset of 88,962 (spelling, context) pairs, showing that the proposed methods can classify names with over 91% accuracy.This paper discusses the use of unlabeled examples in named entity classification, a task that typically requires a large number of labeled examples. The authors propose two algorithms to leverage the redundancy in unlabeled data, which can significantly reduce the need for supervision. The first algorithm is based on decision list learning, similar to Yarowsky's method but with modifications inspired by Blum and Mitchell's work. The second algorithm, CoBoost, extends boosting algorithms to the named entity classification problem, aiming to minimize the disagreement between two classifiers on unlabeled examples. Both algorithms achieve high accuracy with only a few seed rules, demonstrating the effectiveness of using unlabeled data in this context. The paper also includes an evaluation using a dataset of 88,962 (spelling, context) pairs, showing that the proposed methods can classify names with over 91% accuracy.

Unsupervised Models for Named Entity Classification

| Michael Collins and Yoram Singer