[slides] Employing EM and Pool-Based Active Learning for Text Classification

This paper explores the reduction of labeled training documents required for text classification by leveraging a large pool of unlabeled documents. The authors modify the Query-by-Committee (QBC) method to estimate document density explicitly, using the unlabeled pool to select examples for labeling. They combine this with Expectation-Maximization (EM) to fill in the class labels of unlabeled documents. Experimental results show that this approach reduces the need for labeled examples by more than two-thirds compared to previous QBC methods and requires only slightly more than half as many labeled examples as EM alone to achieve the same accuracy. The paper also introduces a technique called *pool-leveraged sampling*, which interleaves active learning and EM to further improve performance. The effectiveness of these methods is demonstrated on real-world text datasets, showing significant improvements in classification accuracy.This paper explores the reduction of labeled training documents required for text classification by leveraging a large pool of unlabeled documents. The authors modify the Query-by-Committee (QBC) method to estimate document density explicitly, using the unlabeled pool to select examples for labeling. They combine this with Expectation-Maximization (EM) to fill in the class labels of unlabeled documents. Experimental results show that this approach reduces the need for labeled examples by more than two-thirds compared to previous QBC methods and requires only slightly more than half as many labeled examples as EM alone to achieve the same accuracy. The paper also introduces a technique called *pool-leveraged sampling*, which interleaves active learning and EM to further improve performance. The effectiveness of these methods is demonstrated on real-world text datasets, showing significant improvements in classification accuracy.

Employing EM and Pool-Based Active Learning for Text Classification

| Andrew Kachites McCallum, Kamal Nigam