Understanding Sentiment Analysis of Short Informal Texts

This paper presents a state-of-the-art sentiment analysis system that detects sentiment at two levels: message-level (overall sentiment of a text) and term-level (sentiment of individual words or phrases). The system uses a supervised statistical text classification approach, leveraging surface-form, semantic, and sentiment features. Sentiment features are primarily derived from high-coverage tweet-specific sentiment lexicons, automatically generated from tweets with sentiment-word hashtags and emoticons. A separate sentiment lexicon is generated for negated words to capture sentiment in negated contexts. The system achieved top performance in the SemEval-2013 shared task on sentiment analysis in Twitter, with F-scores of 69.02 (message-level) and 88.93 (term-level). Post-competition improvements increased these scores to 70.45 and 89.50, respectively. The system also performed well on two additional datasets: the SemEval-2013 SMS test set and a movie review corpus. Ablation experiments showed that the use of automatically generated lexicons improved performance by up to 6.5 percentage points. The system uses three manually created general-purpose sentiment lexicons and two automatically generated tweet-specific lexicons. These lexicons capture peculiarities of social media language, including misspellings, abbreviations, and slang. To handle negation, the system creates separate lexicons for affirmative and negated contexts. These lexicons allow the system to accurately capture the impact of negation on sentiment. The system was evaluated on both intrinsic and extrinsic tasks. Intrinsic evaluation compared the automatically generated lexicons with manually created ones, while extrinsic evaluation tested the system on unsupervised and supervised sentiment analysis tasks. The system outperformed other approaches in both tasks, achieving high accuracy on the SemEval-2013 datasets and on a movie review corpus. The system uses a supervised statistical machine learning approach, employing a linear-kernel Support Vector Machine (SVM) classifier. The classifier leverages a variety of features, including surface-form, semantic, and sentiment lexicon features. These features are derived from both manually created and automatically generated lexicons, allowing the system to accurately detect sentiment in short informal texts such as tweets and SMS messages. The system's performance was validated on multiple datasets, demonstrating its effectiveness in sentiment analysis tasks.This paper presents a state-of-the-art sentiment analysis system that detects sentiment at two levels: message-level (overall sentiment of a text) and term-level (sentiment of individual words or phrases). The system uses a supervised statistical text classification approach, leveraging surface-form, semantic, and sentiment features. Sentiment features are primarily derived from high-coverage tweet-specific sentiment lexicons, automatically generated from tweets with sentiment-word hashtags and emoticons. A separate sentiment lexicon is generated for negated words to capture sentiment in negated contexts. The system achieved top performance in the SemEval-2013 shared task on sentiment analysis in Twitter, with F-scores of 69.02 (message-level) and 88.93 (term-level). Post-competition improvements increased these scores to 70.45 and 89.50, respectively. The system also performed well on two additional datasets: the SemEval-2013 SMS test set and a movie review corpus. Ablation experiments showed that the use of automatically generated lexicons improved performance by up to 6.5 percentage points. The system uses three manually created general-purpose sentiment lexicons and two automatically generated tweet-specific lexicons. These lexicons capture peculiarities of social media language, including misspellings, abbreviations, and slang. To handle negation, the system creates separate lexicons for affirmative and negated contexts. These lexicons allow the system to accurately capture the impact of negation on sentiment. The system was evaluated on both intrinsic and extrinsic tasks. Intrinsic evaluation compared the automatically generated lexicons with manually created ones, while extrinsic evaluation tested the system on unsupervised and supervised sentiment analysis tasks. The system outperformed other approaches in both tasks, achieving high accuracy on the SemEval-2013 datasets and on a movie review corpus. The system uses a supervised statistical machine learning approach, employing a linear-kernel Support Vector Machine (SVM) classifier. The classifier leverages a variety of features, including surface-form, semantic, and sentiment lexicon features. These features are derived from both manually created and automatically generated lexicons, allowing the system to accurately detect sentiment in short informal texts such as tweets and SMS messages. The system's performance was validated on multiple datasets, demonstrating its effectiveness in sentiment analysis tasks.

Sentiment Analysis of Short Informal Texts

2014 | Svetlana Kiritchenko, Xiaodan Zhu, Saif M. Mohammad