Understanding RealToxicityPrompts%3A Evaluating Neural Toxic Degeneration in Language Models

The paper "REALTOXICITYPROMPTS: Evaluating Neural Toxic Degeneration in Language Models" by Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, and Noah A. Smith investigates the tendency of pre-trained neural language models (LMs) to generate toxic content, even from seemingly innocuous prompts. The authors create and release REALTOXICITYPROMPTS, a dataset of 100K naturally occurring sentence-level prompts derived from a large corpus of English web text, paired with toxicity scores from the PERSPECTIVE API. They find that LMs can produce toxic text even when conditioned on non-toxic prompts, and evaluate various controllable generation methods to mitigate this issue. While data- or compute-intensive methods are more effective than simpler solutions, no method is failsafe against neural toxic degeneration. The authors also analyze the toxicity in two web text corpora used to pretrain several LMs, finding significant amounts of offensive and factually unreliable content. The study highlights the need for better data selection processes in LM pretraining and provides a testbed for evaluating toxic generations.The paper "REALTOXICITYPROMPTS: Evaluating Neural Toxic Degeneration in Language Models" by Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, and Noah A. Smith investigates the tendency of pre-trained neural language models (LMs) to generate toxic content, even from seemingly innocuous prompts. The authors create and release REALTOXICITYPROMPTS, a dataset of 100K naturally occurring sentence-level prompts derived from a large corpus of English web text, paired with toxicity scores from the PERSPECTIVE API. They find that LMs can produce toxic text even when conditioned on non-toxic prompts, and evaluate various controllable generation methods to mitigate this issue. While data- or compute-intensive methods are more effective than simpler solutions, no method is failsafe against neural toxic degeneration. The authors also analyze the toxicity in two web text corpora used to pretrain several LMs, finding significant amounts of offensive and factually unreliable content. The study highlights the need for better data selection processes in LM pretraining and provides a testbed for evaluating toxic generations.

REALTOXICITYPROMPTS: Evaluating Neural Toxic Degeneration in Language Models

25 Sep 2020 | Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, Noah A. Smith