RTP-LX: CAN LLMs EVALUATE TOXICITY IN MULTILINGUAL SCENARIOS?

RTP-LX: CAN LLMs EVALUATE TOXICITY IN MULTILINGUAL SCENARIOS?

22 Apr 2024 | Adrian de Wynter, Ishaan Watts, Nektar Ege Altıntoprak, Tua Wongsangaroonsri, Minghui Zhang, Noura Farra, Lena Baur, Samantha Claudet, Pavel Gajdusek, Can Gören, Qilong Gu, Anna Kaminska, Tomasz Kaminski, Ruby Kuo, Akiko Kyuba, Jongho Lee, Kartik Mathur, Petter Merok, Ivana Milovanović, Nani Paananen, Vesa-Matti Paananen, Anna Pavlenko, Bruno Pereira Vidal, Luciano Strika, Yueh Tsao, Davide Turcato, Oleksandr Vakhno, Judit Velcovs, Stéphanie Visser, Herdyan Widarmanto, Andrey Zaikin, and Si-Qing Chen
RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios? This paper introduces RTP-LX, a human-transcreated and human-annotated corpus of toxic prompts and outputs in 28 languages, designed to evaluate the ability of large language models (LLMs) and small language models (SLMs) to detect toxic content in culturally-sensitive, multilingual scenarios. The dataset was created by expanding the RTP dataset, which contains nearly 100,000 toxic sentences mined from Reddit. RTP-LX includes 1,100 toxic prompts and outputs in 28 languages, with a focus on detecting culturally-specific toxic language. The study evaluates seven S/LLMs on their ability to detect toxic content in a culturally-sensitive, multilingual scenario. The results show that while these models typically score acceptably in terms of accuracy, they have low agreement with human judges when judging holistically the toxicity of a prompt, and have difficulty discerning harm in context-dependent scenarios, particularly with subtle-yet-harmful content such as microaggressions and bias. The paper also discusses the challenges of evaluating S/LLMs in multilingual settings, including the difficulty of detecting harmful content in low-resource languages and the need for additional fine-tuning data to mitigate biases. The study highlights the importance of using culturally-sensitive and linguistically diverse datasets to improve the safety and reliability of S/LLMs in multilingual scenarios. The authors release RTP-LX to contribute to further reduce harmful uses of these models and improve their safe deployment.RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios? This paper introduces RTP-LX, a human-transcreated and human-annotated corpus of toxic prompts and outputs in 28 languages, designed to evaluate the ability of large language models (LLMs) and small language models (SLMs) to detect toxic content in culturally-sensitive, multilingual scenarios. The dataset was created by expanding the RTP dataset, which contains nearly 100,000 toxic sentences mined from Reddit. RTP-LX includes 1,100 toxic prompts and outputs in 28 languages, with a focus on detecting culturally-specific toxic language. The study evaluates seven S/LLMs on their ability to detect toxic content in a culturally-sensitive, multilingual scenario. The results show that while these models typically score acceptably in terms of accuracy, they have low agreement with human judges when judging holistically the toxicity of a prompt, and have difficulty discerning harm in context-dependent scenarios, particularly with subtle-yet-harmful content such as microaggressions and bias. The paper also discusses the challenges of evaluating S/LLMs in multilingual settings, including the difficulty of detecting harmful content in low-resource languages and the need for additional fine-tuning data to mitigate biases. The study highlights the importance of using culturally-sensitive and linguistically diverse datasets to improve the safety and reliability of S/LLMs in multilingual scenarios. The authors release RTP-LX to contribute to further reduce harmful uses of these models and improve their safe deployment.
Reach us at info@study.space