[slides] Robust Sentiment Detection on Twitter from Biased and Noisy Data

This paper proposes an approach for automatically detecting sentiments on Twitter messages (tweets) by leveraging characteristics of how tweets are written and meta-information of the words that compose these messages. The approach uses noisy labels from sentiment detection websites as training data, which is common in real-world scenarios. The authors show that their method is more effective and robust compared to previous approaches, especially when dealing with biased and noisy data. Twitter is a popular social media platform with a rapidly growing user base. Users share information and opinions about various topics, making sentiment analysis a crucial task for many applications. Previous approaches for sentiment detection typically use raw word representations (n-grams) as features, but these are less effective for short tweets due to their brevity. The authors propose a two-step sentiment analysis method for Twitter. The first step classifies messages as subjective or objective, while the second step distinguishes subjective tweets as positive or negative. To reduce labeling effort, they use noisy labels from sentiment detection websites instead of manually annotated data. They analyze the quality and bias of these labels and combine them to improve classification performance. The authors introduce two types of features: meta-information about the words and characteristics of how tweets are written. Meta-information includes part-of-speech tags, prior subjectivity, and polarity. Tweet syntax features include retweets, hashtags, replies, links, punctuation, emoticons, and uppercase words. The authors evaluate their approach on three data sources that provide sentiment labels for tweets. They find that combining these sources improves performance, especially when considering the bias and noise in the data. They also show that their approach is more robust to biased and noisy data compared to previous methods. The experiments demonstrate that their approach achieves lower error rates in both subjectivity and polarity detection. The use of abstract representations of tweets and the combination of multiple data sources with different biases leads to better performance. The authors conclude that their approach is effective and robust, even with limited training data, and that combining data sources with distinct characteristics is beneficial. Future work includes a more detailed analysis of sentences to improve sentiment classification.This paper proposes an approach for automatically detecting sentiments on Twitter messages (tweets) by leveraging characteristics of how tweets are written and meta-information of the words that compose these messages. The approach uses noisy labels from sentiment detection websites as training data, which is common in real-world scenarios. The authors show that their method is more effective and robust compared to previous approaches, especially when dealing with biased and noisy data. Twitter is a popular social media platform with a rapidly growing user base. Users share information and opinions about various topics, making sentiment analysis a crucial task for many applications. Previous approaches for sentiment detection typically use raw word representations (n-grams) as features, but these are less effective for short tweets due to their brevity. The authors propose a two-step sentiment analysis method for Twitter. The first step classifies messages as subjective or objective, while the second step distinguishes subjective tweets as positive or negative. To reduce labeling effort, they use noisy labels from sentiment detection websites instead of manually annotated data. They analyze the quality and bias of these labels and combine them to improve classification performance. The authors introduce two types of features: meta-information about the words and characteristics of how tweets are written. Meta-information includes part-of-speech tags, prior subjectivity, and polarity. Tweet syntax features include retweets, hashtags, replies, links, punctuation, emoticons, and uppercase words. The authors evaluate their approach on three data sources that provide sentiment labels for tweets. They find that combining these sources improves performance, especially when considering the bias and noise in the data. They also show that their approach is more robust to biased and noisy data compared to previous methods. The experiments demonstrate that their approach achieves lower error rates in both subjectivity and polarity detection. The use of abstract representations of tweets and the combination of multiple data sources with different biases leads to better performance. The authors conclude that their approach is effective and robust, even with limited training data, and that combining data sources with distinct characteristics is beneficial. Future work includes a more detailed analysis of sentences to improve sentiment classification.

Robust Sentiment Detection on Twitter from Biased and Noisy Data

August 2010 | Luciano Barbosa, Junlan Feng