Automated Hate Speech Detection and the Problem of Offensive Language

Automated Hate Speech Detection and the Problem of Offensive Language

11 Mar 2017 | Thomas Davidson,1 Dana Warmsley,2 Michael Macy,1,3 Ingmar Weber4
The paper "Automated Hate Speech Detection and the Problem of Offensive Language" by Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber addresses the challenge of distinguishing hate speech from other instances of offensive language on social media. The authors use a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords and label them into three categories: hate speech, offensive language, and neither. They train a multi-class classifier to differentiate between these categories and analyze the results to understand the challenges in accurate classification. Key findings include: - Racist and homophobic tweets are more likely to be classified as hate speech. - Sexist tweets are generally classified as offensive. - Tweets without explicit hate keywords are more difficult to classify. - The presence or absence of specific offensive or hateful terms can both help and hinder accurate classification. - Certain terms, such as "n*gger" and "f*ggot," are particularly useful for distinguishing between hate speech and offensive language. - The model performs well at detecting prevalent forms of hate speech, such as anti-black racism and homophobia, but is less reliable at detecting rare types of hate speech. The authors conclude that future work should better account for context and the heterogeneity in hate speech usage, and emphasize the importance of addressing social biases in algorithms to improve accuracy.The paper "Automated Hate Speech Detection and the Problem of Offensive Language" by Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber addresses the challenge of distinguishing hate speech from other instances of offensive language on social media. The authors use a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords and label them into three categories: hate speech, offensive language, and neither. They train a multi-class classifier to differentiate between these categories and analyze the results to understand the challenges in accurate classification. Key findings include: - Racist and homophobic tweets are more likely to be classified as hate speech. - Sexist tweets are generally classified as offensive. - Tweets without explicit hate keywords are more difficult to classify. - The presence or absence of specific offensive or hateful terms can both help and hinder accurate classification. - Certain terms, such as "n*gger" and "f*ggot," are particularly useful for distinguishing between hate speech and offensive language. - The model performs well at detecting prevalent forms of hate speech, such as anti-black racism and homophobia, but is less reliable at detecting rare types of hate speech. The authors conclude that future work should better account for context and the heterogeneity in hate speech usage, and emphasize the importance of addressing social biases in algorithms to improve accuracy.
Reach us at info@study.space