28 May 2002 | Bo Pang and Lillian Lee, Shivakumar Vaithyanathan
The paper "Thumbs up? Sentiment Classification using Machine Learning Techniques" by Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan explores the challenge of classifying documents based on their overall sentiment, such as determining whether a movie review is positive or negative. Using movie reviews as a dataset, the authors find that standard machine learning techniques (Naive Bayes, maximum entropy classification, and support vector machines) outperform human-produced baselines. However, these methods perform less well on sentiment classification compared to traditional topic-based categorization. The paper also examines factors that make sentiment classification more challenging, such as the subtle expression of sentiment and the need for understanding beyond keyword identification. The authors conclude by discussing the potential applications of sentiment classification in various fields, including business intelligence and recommender systems, and suggest that corpus-based techniques may be more effective than prior intuitions for selecting good indicator features.The paper "Thumbs up? Sentiment Classification using Machine Learning Techniques" by Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan explores the challenge of classifying documents based on their overall sentiment, such as determining whether a movie review is positive or negative. Using movie reviews as a dataset, the authors find that standard machine learning techniques (Naive Bayes, maximum entropy classification, and support vector machines) outperform human-produced baselines. However, these methods perform less well on sentiment classification compared to traditional topic-based categorization. The paper also examines factors that make sentiment classification more challenging, such as the subtle expression of sentiment and the need for understanding beyond keyword identification. The authors conclude by discussing the potential applications of sentiment classification in various fields, including business intelligence and recommender systems, and suggest that corpus-based techniques may be more effective than prior intuitions for selecting good indicator features.