Twitter Sentiment Analysis: The Good the Bad and the OMG!

Twitter Sentiment Analysis: The Good the Bad and the OMG!

2011 | Efthymios Kouloumpis, Theresa Wilson, Johanna Moore
This paper investigates the utility of linguistic features for detecting the sentiment of Twitter messages. The authors evaluate the usefulness of existing lexical resources and features that capture information about the informal and creative language used in microblogging. They take a supervised approach to the problem, leveraging existing hashtags in the Twitter data for building training data. The study explores the use of hashtags (e.g., #bestfeeling, #epicfail, #news) to identify positive, negative, and neutral tweets for training three-way sentiment classifiers. The authors use three different corpora of Twitter messages: a hashtagged data set (HASH), an emoticon data set (EMOT), and a manually annotated data set (ISIEVE). The HASH data set is derived from the Edinburgh Twitter corpus, while the EMOT data set is based on tweets with positive and negative emoticons. The ISIEVE data set contains approximately 4,000 tweets, hand-annotated by the iSieve Corporation. The authors perform preprocessing steps including tokenization, normalization, and part-of-speech (POS) tagging. They use a variety of features for classification experiments, including n-gram features, lexicon features, and microblogging features. The experiments show that using hashtags and emoticons for training data is useful for sentiment analysis. However, the effectiveness of the features depends on the type of features used. The best results come from using n-grams, lexical features, and Twitter features trained on the hashtagged data alone. The study concludes that part-of-speech features may not be useful for sentiment analysis in the microblogging domain, and that microblogging features (e.g., intensifiers and emoticons) are the most useful.This paper investigates the utility of linguistic features for detecting the sentiment of Twitter messages. The authors evaluate the usefulness of existing lexical resources and features that capture information about the informal and creative language used in microblogging. They take a supervised approach to the problem, leveraging existing hashtags in the Twitter data for building training data. The study explores the use of hashtags (e.g., #bestfeeling, #epicfail, #news) to identify positive, negative, and neutral tweets for training three-way sentiment classifiers. The authors use three different corpora of Twitter messages: a hashtagged data set (HASH), an emoticon data set (EMOT), and a manually annotated data set (ISIEVE). The HASH data set is derived from the Edinburgh Twitter corpus, while the EMOT data set is based on tweets with positive and negative emoticons. The ISIEVE data set contains approximately 4,000 tweets, hand-annotated by the iSieve Corporation. The authors perform preprocessing steps including tokenization, normalization, and part-of-speech (POS) tagging. They use a variety of features for classification experiments, including n-gram features, lexicon features, and microblogging features. The experiments show that using hashtags and emoticons for training data is useful for sentiment analysis. However, the effectiveness of the features depends on the type of features used. The best results come from using n-grams, lexical features, and Twitter features trained on the hashtagged data alone. The study concludes that part-of-speech features may not be useful for sentiment analysis in the microblogging domain, and that microblogging features (e.g., intensifiers and emoticons) are the most useful.
Reach us at info@study.space