17 Feb 2024 | Xiangjue Dong, Yibo Wang, Philip S. Yu, James Caverlee
This paper investigates gender bias in large language models (LLMs) and proposes an indirect probing framework based on conditional generation to reveal hidden biases without explicit gender or stereotype mentions. The study explores three strategies to detect explicit and implicit gender bias in LLMs, finding that all tested models exhibit bias, even when no gender stereotypes are present in the input. Larger models and aligned versions tend to show more bias. The paper also presents three methods to mitigate gender bias: hyperparameter tuning, instruction guiding, and debias tuning. Results show that debias tuning is the most effective, significantly reducing bias metrics such as Gender Attribute Score (GAS), Gender Logits Difference (GLD), and Attribute Distribution Distance (ADD). The study highlights the importance of addressing gender bias in LLMs to ensure ethical and responsible AI development.This paper investigates gender bias in large language models (LLMs) and proposes an indirect probing framework based on conditional generation to reveal hidden biases without explicit gender or stereotype mentions. The study explores three strategies to detect explicit and implicit gender bias in LLMs, finding that all tested models exhibit bias, even when no gender stereotypes are present in the input. Larger models and aligned versions tend to show more bias. The paper also presents three methods to mitigate gender bias: hyperparameter tuning, instruction guiding, and debias tuning. Results show that debias tuning is the most effective, significantly reducing bias metrics such as Gender Attribute Score (GAS), Gender Logits Difference (GLD), and Attribute Distribution Distance (ADD). The study highlights the importance of addressing gender bias in LLMs to ensure ethical and responsible AI development.