11 Mar 2017 | Thomas Davidson,1 Dana Warmsley,2 Michael Macy,1,3 Ingmar Weber4
The paper addresses the challenge of distinguishing hate speech from offensive language in social media. Traditional lexical methods often misclassify messages due to their broad definitions. The authors use a crowd-sourced hate speech lexicon to collect tweets and label them into three categories: hate speech, offensive language, and neither. A multi-class classifier is trained to differentiate these categories. Analysis shows that hate speech is often more clearly identifiable than offensive language, but some cases are ambiguous. Racist and homophobic tweets are more likely to be classified as hate speech, while sexist tweets are generally offensive. Tweets without explicit hate keywords are harder to classify.
The study highlights the difficulty in accurately distinguishing hate speech from offensive language, especially when context is not considered. Previous work has often conflated the two, leading to misclassification. The authors' model, using logistic regression and SVM, achieves high precision and recall, but still misclassifies some tweets. The results show that hate speech is often associated with specific terms like "f*ggot" and "n*gger," while other terms like "b*tch" are more commonly offensive. The model performs well in detecting common forms of hate speech, such as anti-black racism and homophobia, but struggles with less frequent types.
The study also reveals that human coders may mislabel tweets due to biases, such as considering sexist language as offensive rather than hate speech. The authors emphasize the importance of context in hate speech detection and the need for future work to account for social structures and motivations behind hate speech. Overall, the paper underscores the complexity of hate speech detection and the need for more nuanced approaches that consider both linguistic and contextual factors.The paper addresses the challenge of distinguishing hate speech from offensive language in social media. Traditional lexical methods often misclassify messages due to their broad definitions. The authors use a crowd-sourced hate speech lexicon to collect tweets and label them into three categories: hate speech, offensive language, and neither. A multi-class classifier is trained to differentiate these categories. Analysis shows that hate speech is often more clearly identifiable than offensive language, but some cases are ambiguous. Racist and homophobic tweets are more likely to be classified as hate speech, while sexist tweets are generally offensive. Tweets without explicit hate keywords are harder to classify.
The study highlights the difficulty in accurately distinguishing hate speech from offensive language, especially when context is not considered. Previous work has often conflated the two, leading to misclassification. The authors' model, using logistic regression and SVM, achieves high precision and recall, but still misclassifies some tweets. The results show that hate speech is often associated with specific terms like "f*ggot" and "n*gger," while other terms like "b*tch" are more commonly offensive. The model performs well in detecting common forms of hate speech, such as anti-black racism and homophobia, but struggles with less frequent types.
The study also reveals that human coders may mislabel tweets due to biases, such as considering sexist language as offensive rather than hate speech. The authors emphasize the importance of context in hate speech detection and the need for future work to account for social structures and motivations behind hate speech. Overall, the paper underscores the complexity of hate speech detection and the need for more nuanced approaches that consider both linguistic and contextual factors.