RTP-LX: CAN LLMs EVALUATE TOXICITY IN MULTILINGUAL SCENARIOS?

RTP-LX: CAN LLMs EVALUATE TOXICITY IN MULTILINGUAL SCENARIOS?

22 Apr 2024 | Adrian de Wynter, Ishaan Watts, Nektar Ege Altintoprak, Tua Wongsangaroonsri, Minghui Zhang, Noura Farra, Lena Baur, Samantha Claudet, Pavel Gajdusek, Can Gören, Qilong Gu, Anna Kaminska, Tomasz Kaminski, Ruby Kuo, Akiko Kyuba, Jongho Lee, Kartik Mathur, Petter Merok, Ivana Milovanovic, Nani Paananen, Vesa-Matti Paananen, Anna Pavlenko, Bruno Pereira Vidal, Luciano Strika, Yuez Tsao, Davide Turcato, Oleksandr Vakhno, Judit Velcsov, Anna Vickers, Stephanie Visser, Herdyan Widarmanto, Andrey Zaikin, and Si-Qing Chen
The paper "RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?" by Adrian de Wynter et al. addresses the safety concerns of large language models (LLMs) and small language models (SLMs) in multilingual settings. The authors introduce RTP-LX, a human-transcreated and annotated corpus of toxic prompts and outputs in 28 languages, designed to detect culturally-specific toxic language. They evaluate seven SLMs on their ability to identify toxic content in a culturally sensitive, multilingual context. The results show that while the models generally perform acceptably in terms of accuracy, they have low agreement with human judges when judging the overall toxicity of a prompt and struggle with context-dependent scenarios, particularly with subtle yet harmful content such as microaggressions and bias. The paper highlights the need for further improvements in LLMs to ensure safe deployment and reduces harmful uses of these models.The paper "RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?" by Adrian de Wynter et al. addresses the safety concerns of large language models (LLMs) and small language models (SLMs) in multilingual settings. The authors introduce RTP-LX, a human-transcreated and annotated corpus of toxic prompts and outputs in 28 languages, designed to detect culturally-specific toxic language. They evaluate seven SLMs on their ability to identify toxic content in a culturally sensitive, multilingual context. The results show that while the models generally perform acceptably in terms of accuracy, they have low agreement with human judges when judging the overall toxicity of a prompt and struggle with context-dependent scenarios, particularly with subtle yet harmful content such as microaggressions and bias. The paper highlights the need for further improvements in LLMs to ensure safe deployment and reduces harmful uses of these models.
Reach us at info@study.space
[slides] RTP-LX%3A Can LLMs Evaluate Toxicity in Multilingual Scenarios%3F | StudySpace