Understanding Twitter Sentiment Classi%EF%AC%81cation using Distant Supervision

This paper introduces a novel approach for automatically classifying the sentiment of Twitter messages, specifically focusing on whether the sentiment is positive or negative with respect to a query term. The authors address the challenge of sentiment classification in microblogging services like Twitter, where traditional supervised learning methods require extensive and manually labeled data. To overcome this, they propose using distant supervision, where training data consists of tweets with emoticons, which serve as noisy labels. The study evaluates three machine learning algorithms—Naive Bayes, Maximum Entropy, and SVM—and compares their performance using different feature extractors (unigrams, bigrams, and parts of speech tags). The results show that these algorithms achieve accuracy above 80% when trained with emoticon data. The paper also discusses preprocessing steps and future work, including the potential benefits of incorporating semantics, domain-specific tweets, handling neutral tweets, and internationalization. The authors conclude that using emoticons as labels for training data is an effective method for distant supervised learning in sentiment classification on Twitter.This paper introduces a novel approach for automatically classifying the sentiment of Twitter messages, specifically focusing on whether the sentiment is positive or negative with respect to a query term. The authors address the challenge of sentiment classification in microblogging services like Twitter, where traditional supervised learning methods require extensive and manually labeled data. To overcome this, they propose using distant supervision, where training data consists of tweets with emoticons, which serve as noisy labels. The study evaluates three machine learning algorithms—Naive Bayes, Maximum Entropy, and SVM—and compares their performance using different feature extractors (unigrams, bigrams, and parts of speech tags). The results show that these algorithms achieve accuracy above 80% when trained with emoticon data. The paper also discusses preprocessing steps and future work, including the potential benefits of incorporating semantics, domain-specific tweets, handling neutral tweets, and internationalization. The authors conclude that using emoticons as labels for training data is an effective method for distant supervised learning in sentiment classification on Twitter.

Twitter Sentiment Classification using Distant Supervision

| Alec Go, Richa Bhayani, Lei Huang