Thumbs up? Sentiment Classification using Machine Learning Techniques

Thumbs up? Sentiment Classification using Machine Learning Techniques

2002 | Bo Pang and Lillian Lee, Shivakumar Vaithyanathan
This paper presents a study on sentiment classification using machine learning techniques, comparing their performance with human-produced baselines. The research focuses on classifying movie reviews as positive or negative, using data from the Internet Movie Database (IMDb). The study finds that standard machine learning methods, such as Naive Bayes, maximum entropy classification, and support vector machines (SVMs), outperform human baselines in sentiment classification. However, these methods do not perform as well as traditional topic-based categorization. The study also highlights the challenges of sentiment classification, which requires more nuanced understanding than topic-based classification. The researchers used a dataset of 752 negative and 1301 positive movie reviews, with each review labeled by the author's rating. They tested various machine learning models and found that SVMs performed the best, achieving an accuracy of 69%. The study also examined the impact of different features, such as unigrams, bigrams, and parts of speech, on classification accuracy. While unigrams provided the best results, bigrams and parts of speech did not significantly improve performance. The study concludes that sentiment classification is more challenging than topic-based classification, as sentiment is more subtle and requires understanding of context and discourse. The researchers suggest that future work should focus on identifying features that indicate whether sentences are on-topic, which is a key challenge in sentiment analysis. The study also highlights the importance of using corpus-based techniques rather than relying on prior intuitions to select features for sentiment classification.This paper presents a study on sentiment classification using machine learning techniques, comparing their performance with human-produced baselines. The research focuses on classifying movie reviews as positive or negative, using data from the Internet Movie Database (IMDb). The study finds that standard machine learning methods, such as Naive Bayes, maximum entropy classification, and support vector machines (SVMs), outperform human baselines in sentiment classification. However, these methods do not perform as well as traditional topic-based categorization. The study also highlights the challenges of sentiment classification, which requires more nuanced understanding than topic-based classification. The researchers used a dataset of 752 negative and 1301 positive movie reviews, with each review labeled by the author's rating. They tested various machine learning models and found that SVMs performed the best, achieving an accuracy of 69%. The study also examined the impact of different features, such as unigrams, bigrams, and parts of speech, on classification accuracy. While unigrams provided the best results, bigrams and parts of speech did not significantly improve performance. The study concludes that sentiment classification is more challenging than topic-based classification, as sentiment is more subtle and requires understanding of context and discourse. The researchers suggest that future work should focus on identifying features that indicate whether sentences are on-topic, which is a key challenge in sentiment analysis. The study also highlights the importance of using corpus-based techniques rather than relying on prior intuitions to select features for sentiment classification.
Reach us at info@study.space
[slides and audio] Thumbs up%3F Sentiment Classification using Machine Learning Techniques