Twitter Sentiment Analysis: The Good the Bad and the OMG!

Twitter Sentiment Analysis: The Good the Bad and the OMG!

2011 | Efthymios Kouloumpis, Theresa Wilson, Johanna Moore
This paper investigates the effectiveness of linguistic features in detecting sentiment in Twitter messages. The authors evaluate the utility of existing lexical resources and features that capture informal and creative language used in microblogging. They adopt a supervised approach, leveraging hashtags in Twitter data for training. The study explores the challenges of sentiment analysis in microblogging, such as the informal language and the broad range of topics covered. The paper uses three corpora: the hashtagged data set (HASH), the emoticon data set (EMOT), and the iSieve data set. Preprocessing steps include tokenization, normalization, and part-of-speech tagging. Various features, including n-grams, lexicon features, part-of-speech features, and microblogging-specific features, are used for classification. The experiments show that combining n-grams with lexicon and microblogging features yields the best performance. However, part-of-speech features do not perform well in the microblogging domain. The study also finds that using hashtags and emoticons for training data improves performance, but the benefits are lessened when microblogging features are included. The findings suggest that microblogging-specific features are crucial for accurate sentiment analysis in Twitter data.This paper investigates the effectiveness of linguistic features in detecting sentiment in Twitter messages. The authors evaluate the utility of existing lexical resources and features that capture informal and creative language used in microblogging. They adopt a supervised approach, leveraging hashtags in Twitter data for training. The study explores the challenges of sentiment analysis in microblogging, such as the informal language and the broad range of topics covered. The paper uses three corpora: the hashtagged data set (HASH), the emoticon data set (EMOT), and the iSieve data set. Preprocessing steps include tokenization, normalization, and part-of-speech tagging. Various features, including n-grams, lexicon features, part-of-speech features, and microblogging-specific features, are used for classification. The experiments show that combining n-grams with lexicon and microblogging features yields the best performance. However, part-of-speech features do not perform well in the microblogging domain. The study also finds that using hashtags and emoticons for training data improves performance, but the benefits are lessened when microblogging features are included. The findings suggest that microblogging-specific features are crucial for accurate sentiment analysis in Twitter data.
Reach us at info@study.space