Understanding Learning Extraction Patterns for Subjective Expressions

This paper presents a bootstrapping process that learns linguistically rich extraction patterns for subjective (opinionated) expressions. High-precision classifiers label unannotated data to automatically create a large training set, which is then given to an extraction pattern learning algorithm. The learned patterns are then used to identify more subjective sentences. The bootstrapping process learns many subjective patterns and increases recall while maintaining high precision. The goal of this research is to use high-precision subjectivity classifiers to automatically identify subjective and objective sentences in unannotated text corpora. The high-precision classifiers label a sentence as subjective or objective when they are confident about the classification, and they leave a sentence unlabeled otherwise. Unannotated texts are easy to come by, so even if the classifiers can label only 30% of the sentences as subjective or objective, they will still produce a large collection of labeled sentences. Most importantly, the high-precision classifiers can generate a much larger set of labeled sentences than are currently available in manually created data sets. We also find that the learned extraction patterns capture subtle connotations that are more expressive than the individual words by themselves. The paper discusses previous work on subjectivity analysis and extraction pattern learning. It then overviews the general approach, describes the high-precision subjectivity classifiers, and explains the algorithm for learning extraction patterns associated with subjectivity. The data used, experimental results, and examples of learned patterns are presented. The paper concludes with a summary of findings and conclusions. The research explores several avenues for improving the state-of-the-art in subjectivity analysis. It demonstrates that high-precision subjectivity classification can be used to generate a large amount of labeled training data for subsequent learning algorithms. It shows that an extraction pattern learning technique can learn subjective expressions that are linguistically richer than individual words or fixed phrases. It also shows that similar expressions may behave very differently, so that one expression may be strongly indicative of subjectivity but the other may not. The research augments the original high-precision subjective classifier with newly learned extraction patterns, resulting in substantially higher recall with minimal loss in precision. Future work includes experimenting with different configurations of these classifiers and addressing the problem of identifying new objective sentences during bootstrapping.This paper presents a bootstrapping process that learns linguistically rich extraction patterns for subjective (opinionated) expressions. High-precision classifiers label unannotated data to automatically create a large training set, which is then given to an extraction pattern learning algorithm. The learned patterns are then used to identify more subjective sentences. The bootstrapping process learns many subjective patterns and increases recall while maintaining high precision. The goal of this research is to use high-precision subjectivity classifiers to automatically identify subjective and objective sentences in unannotated text corpora. The high-precision classifiers label a sentence as subjective or objective when they are confident about the classification, and they leave a sentence unlabeled otherwise. Unannotated texts are easy to come by, so even if the classifiers can label only 30% of the sentences as subjective or objective, they will still produce a large collection of labeled sentences. Most importantly, the high-precision classifiers can generate a much larger set of labeled sentences than are currently available in manually created data sets. We also find that the learned extraction patterns capture subtle connotations that are more expressive than the individual words by themselves. The paper discusses previous work on subjectivity analysis and extraction pattern learning. It then overviews the general approach, describes the high-precision subjectivity classifiers, and explains the algorithm for learning extraction patterns associated with subjectivity. The data used, experimental results, and examples of learned patterns are presented. The paper concludes with a summary of findings and conclusions. The research explores several avenues for improving the state-of-the-art in subjectivity analysis. It demonstrates that high-precision subjectivity classification can be used to generate a large amount of labeled training data for subsequent learning algorithms. It shows that an extraction pattern learning technique can learn subjective expressions that are linguistically richer than individual words or fixed phrases. It also shows that similar expressions may behave very differently, so that one expression may be strongly indicative of subjectivity but the other may not. The research augments the original high-precision subjective classifier with newly learned extraction patterns, resulting in substantially higher recall with minimal loss in precision. Future work includes experimenting with different configurations of these classifiers and addressing the problem of identifying new objective sentences during bootstrapping.

Learning Extraction Patterns for Subjective Expressions

| Ellen Riloff, Jan yce Wiebe