23 June 2011 | Apoorv Agarwal, Boyi Xie, Ilia Vovsha, Owen Rambow, Rebecca Passonneau
This paper presents a sentiment analysis study on Twitter data. The authors introduce POS-specific prior polarity features and explore the use of a tree kernel to reduce the need for tedious feature engineering. Their experiments show that the tree kernel model outperforms both the unigram model and the feature-based model. The tree kernel model performs as well as the best feature-based models without requiring detailed feature engineering.
The study uses manually annotated Twitter data collected in a streaming fashion, ensuring a true sample of actual tweets. The authors introduce two new resources: an emoticon dictionary and an acronym dictionary. They also present extensive feature analysis, showing that features combining prior polarity with parts-of-speech tags are most important for both classification tasks.
The authors experiment with three models: unigram, feature-based, and tree kernel. They also test combinations of these models. The tree kernel model outperforms the unigram model by a significant margin. The feature-based model with 100 features achieves similar accuracy to the unigram model with over 10,000 features. The combination of unigrams with features and features with the tree kernel outperforms the unigram baseline by over 4% for both classification tasks.
The study shows that standard natural language processing tools are useful even in a genre different from the genre on which they were trained (newswire). The authors also show that the tree kernel model performs roughly as well as the best feature-based models, even though it does not require detailed feature engineering.
The authors present results for two classification tasks: binary (positive vs. negative) and three-way (positive vs. negative vs. neutral). The unigram model achieves a gain of over 20% over the chance baseline for both tasks. The tree kernel model outperforms the unigram and feature-based models by a significant margin. The feature-based model with 100 features performs as well as the unigram model with over 10,000 features. The combination of unigrams with features and features with the tree kernel outperforms the unigram baseline by over 4% for both tasks.
The authors conclude that sentiment analysis for Twitter data is not that different from sentiment analysis for other genres. They suggest future work in exploring even richer linguistic analysis, such as parsing, semantic analysis, and topic modeling.This paper presents a sentiment analysis study on Twitter data. The authors introduce POS-specific prior polarity features and explore the use of a tree kernel to reduce the need for tedious feature engineering. Their experiments show that the tree kernel model outperforms both the unigram model and the feature-based model. The tree kernel model performs as well as the best feature-based models without requiring detailed feature engineering.
The study uses manually annotated Twitter data collected in a streaming fashion, ensuring a true sample of actual tweets. The authors introduce two new resources: an emoticon dictionary and an acronym dictionary. They also present extensive feature analysis, showing that features combining prior polarity with parts-of-speech tags are most important for both classification tasks.
The authors experiment with three models: unigram, feature-based, and tree kernel. They also test combinations of these models. The tree kernel model outperforms the unigram model by a significant margin. The feature-based model with 100 features achieves similar accuracy to the unigram model with over 10,000 features. The combination of unigrams with features and features with the tree kernel outperforms the unigram baseline by over 4% for both classification tasks.
The study shows that standard natural language processing tools are useful even in a genre different from the genre on which they were trained (newswire). The authors also show that the tree kernel model performs roughly as well as the best feature-based models, even though it does not require detailed feature engineering.
The authors present results for two classification tasks: binary (positive vs. negative) and three-way (positive vs. negative vs. neutral). The unigram model achieves a gain of over 20% over the chance baseline for both tasks. The tree kernel model outperforms the unigram and feature-based models by a significant margin. The feature-based model with 100 features performs as well as the unigram model with over 10,000 features. The combination of unigrams with features and features with the tree kernel outperforms the unigram baseline by over 4% for both tasks.
The authors conclude that sentiment analysis for Twitter data is not that different from sentiment analysis for other genres. They suggest future work in exploring even richer linguistic analysis, such as parsing, semantic analysis, and topic modeling.