[slides and audio] RTP-LX%3A Can LLMs Evaluate Toxicity in Multilingual Scenarios%3F

The paper "RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?" by Adrian de Wynter et al. addresses the safety concerns of large language models (LLMs) and small language models (SLMs) in multilingual settings. The authors introduce RTP-LX, a human-transcreated and annotated corpus of toxic prompts and outputs in 28 languages, designed to detect culturally-specific toxic language. They evaluate seven SLMs on their ability to identify toxic content in a culturally sensitive, multilingual context. The results show that while the models generally perform acceptably in terms of accuracy, they have low agreement with human judges when judging the overall toxicity of a prompt and struggle with context-dependent scenarios, particularly with subtle yet harmful content such as microaggressions and bias. The paper highlights the need for further improvements in LLMs to ensure safe deployment and reduces harmful uses of these models.The paper "RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?" by Adrian de Wynter et al. addresses the safety concerns of large language models (LLMs) and small language models (SLMs) in multilingual settings. The authors introduce RTP-LX, a human-transcreated and annotated corpus of toxic prompts and outputs in 28 languages, designed to detect culturally-specific toxic language. They evaluate seven SLMs on their ability to identify toxic content in a culturally sensitive, multilingual context. The results show that while the models generally perform acceptably in terms of accuracy, they have low agreement with human judges when judging the overall toxicity of a prompt and struggle with context-dependent scenarios, particularly with subtle yet harmful content such as microaggressions and bias. The paper highlights the need for further improvements in LLMs to ensure safe deployment and reduces harmful uses of these models.

RTP-LX: CAN LLMs EVALUATE TOXICITY IN MULTILINGUAL SCENARIOS?