Lexicon-Based Methods for Sentiment Analysis

Lexicon-Based Methods for Sentiment Analysis

2011 | Maite Taboada, Julian Brooke, Milan Tofiloski, Kimberly Voll, Manfred Stede
This paper presents a lexicon-based approach to sentiment analysis using the Semantic Orientation CALculator (SO-CAL). SO-CAL uses dictionaries of words annotated with their semantic orientation (polarity and strength), and incorporates intensification and negation. It is applied to the polarity classification task, which involves assigning a positive or negative label to a text based on its opinion towards its main subject matter. The method is shown to perform consistently across domains and on completely unseen data. The paper describes the process of dictionary creation and the use of Mechanical Turk to check dictionaries for consistency and reliability. SO-CAL calculates semantic orientation by considering the semantic orientation of individual words and contextual valence shifters, such as intensifiers, downtoners, negation, and irrealis markers. It handles negation and intensification in a way that generalizes to all words with semantic orientation values. The system includes dictionaries for adjectives, nouns, verbs, and adverbs, with each word assigned a hand-ranked SO value between -5 and +5. The system also incorporates multi-word expressions and uses a percentage scale to model intensification. SO-CAL is tested on various data sets, including Epinions 1 and 2, the Polarity Dataset, and other corpora. The results show that SO-CAL performs well across different domains and is robust to domain changes. The system also includes features such as text-level weighting, multiple cut-offs, and the ability to handle negation and intensification. The paper concludes that SO-CAL is a robust method for sentiment analysis, combining carefully crafted dictionaries and features inspired by linguistic insights.This paper presents a lexicon-based approach to sentiment analysis using the Semantic Orientation CALculator (SO-CAL). SO-CAL uses dictionaries of words annotated with their semantic orientation (polarity and strength), and incorporates intensification and negation. It is applied to the polarity classification task, which involves assigning a positive or negative label to a text based on its opinion towards its main subject matter. The method is shown to perform consistently across domains and on completely unseen data. The paper describes the process of dictionary creation and the use of Mechanical Turk to check dictionaries for consistency and reliability. SO-CAL calculates semantic orientation by considering the semantic orientation of individual words and contextual valence shifters, such as intensifiers, downtoners, negation, and irrealis markers. It handles negation and intensification in a way that generalizes to all words with semantic orientation values. The system includes dictionaries for adjectives, nouns, verbs, and adverbs, with each word assigned a hand-ranked SO value between -5 and +5. The system also incorporates multi-word expressions and uses a percentage scale to model intensification. SO-CAL is tested on various data sets, including Epinions 1 and 2, the Polarity Dataset, and other corpora. The results show that SO-CAL performs well across different domains and is robust to domain changes. The system also includes features such as text-level weighting, multiple cut-offs, and the ability to handle negation and intensification. The paper concludes that SO-CAL is a robust method for sentiment analysis, combining carefully crafted dictionaries and features inspired by linguistic insights.
Reach us at info@study.space
[slides and audio] Lexicon-Based Methods for Sentiment Analysis