[slides] Disclosure and Mitigation of Gender Bias in LLMs

The paper "Disclosure and Mitigation of Gender Bias in LLMs" by Xiangjue Dong, Yibo Wang, Philip S. Yu, and James Caverlee explores the issue of gender bias in Large Language Models (LLMs). The authors propose an indirect probing framework based on conditional generation to induce LLMs to disclose their gender bias, even without explicit gender or stereotype mentions. They develop three strategies to probe explicit and implicit gender bias in LLMs and find that all tested models exhibit such bias, even when gender stereotypes are not present in the inputs. Additionally, larger or more aligned models tend to amplify bias. To mitigate this bias, the authors investigate three methods: Hyperparameter Tuning, Instruction Guiding, and Debias Tuning. Hyperparameter Tuning involves adjusting parameters like temperature, Top-$p$, and Top-$K$ to reduce bias. Instruction Guiding adds a prompt to continue sentences without gender mentions, while Debias Tuning uses a QLoRA-based method to optimize the distribution of gender attribute words and reduce bias. Experiments on ten LLMs across four model series (LLaMA2, Vicuna, FALCON, and OPT) show that Debias Tuning is the most effective method, significantly reducing explicit and implicit bias. The paper also includes an ablation study to analyze the effectiveness of each component of Debias Tuning. The authors conclude that their proposed framework and methods are effective in uncovering and mitigating gender bias in LLMs, even in the absence of explicit gender or stereotype mentions. They discuss limitations and future work, including the need for more sophisticated data selection strategies and the potential for adapting their framework to non-binary gender definitions.The paper "Disclosure and Mitigation of Gender Bias in LLMs" by Xiangjue Dong, Yibo Wang, Philip S. Yu, and James Caverlee explores the issue of gender bias in Large Language Models (LLMs). The authors propose an indirect probing framework based on conditional generation to induce LLMs to disclose their gender bias, even without explicit gender or stereotype mentions. They develop three strategies to probe explicit and implicit gender bias in LLMs and find that all tested models exhibit such bias, even when gender stereotypes are not present in the inputs. Additionally, larger or more aligned models tend to amplify bias. To mitigate this bias, the authors investigate three methods: Hyperparameter Tuning, Instruction Guiding, and Debias Tuning. Hyperparameter Tuning involves adjusting parameters like temperature, Top-$p$, and Top-$K$ to reduce bias. Instruction Guiding adds a prompt to continue sentences without gender mentions, while Debias Tuning uses a QLoRA-based method to optimize the distribution of gender attribute words and reduce bias. Experiments on ten LLMs across four model series (LLaMA2, Vicuna, FALCON, and OPT) show that Debias Tuning is the most effective method, significantly reducing explicit and implicit bias. The paper also includes an ablation study to analyze the effectiveness of each component of Debias Tuning. The authors conclude that their proposed framework and methods are effective in uncovering and mitigating gender bias in LLMs, even in the absence of explicit gender or stereotype mentions. They discuss limitations and future work, including the need for more sophisticated data selection strategies and the potential for adapting their framework to non-binary gender definitions.

Disclosure and Mitigation of Gender Bias in LLMs

17 Feb 2024 | Xiangjue Dong1*, Yibo Wang2*, Philip S. Yu2, James Caverlee1

17 Feb 2024 | Xiangjue Dong1, Yibo Wang2, Philip S. Yu2, James Caverlee1