[slides] Parameter-Efficient Detoxification with Contrastive Decoding

This paper introduces DETOXIGEN, a parameter-efficient detoxification framework for text generation. DETOXIGEN uses a pre-trained language model (generator) and a detoxifier, which is trained on toxic data to generate text in that style. During inference, the detoxifier is used to contrast against the generator's output, helping it avoid generating toxic tokens. The detoxifier is obtained through soft prompt-tuning using the same backbone language model as the generator, requiring only a small amount of additional parameters. This makes DETOXIGEN a lightweight and practical approach for detoxification. The framework is evaluated on the REALTOXICITYPROMPTS benchmark, where it significantly outperforms previous approaches in detoxification metrics without compromising generation quality. Ablation studies show that pairing a generator with a detoxifier that shares the same backbone language model yields the best performance. The detoxifier manipulates the output probability distribution of the generator, with the generator avoiding tokens the detoxifier considers highly probable. The framework also uses nucleus sampling to limit the vocabulary size and ensure plausible token generation. The detoxifier is trained using prompt tuning on toxic data, and the model is evaluated on toxicity and quality metrics. The results show that DETOXIGEN effectively reduces toxicity while maintaining generation quality. The framework is also shown to be transferable, as it only requires toxic data and does not need contrastive non-toxic data. The approach is parameter-efficient, requiring only a small number of additional parameters, making it suitable for large language models. The framework is applicable to various tasks, including generating text with desired styles, and can be extended to control for multiple attributes. Overall, DETOXIGEN provides a promising solution for detoxification in text generation.This paper introduces DETOXIGEN, a parameter-efficient detoxification framework for text generation. DETOXIGEN uses a pre-trained language model (generator) and a detoxifier, which is trained on toxic data to generate text in that style. During inference, the detoxifier is used to contrast against the generator's output, helping it avoid generating toxic tokens. The detoxifier is obtained through soft prompt-tuning using the same backbone language model as the generator, requiring only a small amount of additional parameters. This makes DETOXIGEN a lightweight and practical approach for detoxification. The framework is evaluated on the REALTOXICITYPROMPTS benchmark, where it significantly outperforms previous approaches in detoxification metrics without compromising generation quality. Ablation studies show that pairing a generator with a detoxifier that shares the same backbone language model yields the best performance. The detoxifier manipulates the output probability distribution of the generator, with the generator avoiding tokens the detoxifier considers highly probable. The framework also uses nucleus sampling to limit the vocabulary size and ensure plausible token generation. The detoxifier is trained using prompt tuning on toxic data, and the model is evaluated on toxicity and quality metrics. The results show that DETOXIGEN effectively reduces toxicity while maintaining generation quality. The framework is also shown to be transferable, as it only requires toxic data and does not need contrastive non-toxic data. The approach is parameter-efficient, requiring only a small number of additional parameters, making it suitable for large language models. The framework is applicable to various tasks, including generating text with desired styles, and can be extended to control for multiple attributes. Overall, DETOXIGEN provides a promising solution for detoxification in text generation.

Parameter-Efficient Detoxification with Contrastive Decoding

13 Jan 2024 | Tong Niu, Caiming Xiong, Semih Yavuz*, Yingbo Zhou*

13 Jan 2024 | Tong Niu, Caiming Xiong, Semih Yavuz, Yingbo Zhou