Gender Bias in Large Language Models across Multiple Languages

Gender Bias in Large Language Models across Multiple Languages

1 Mar 2024 | Jinman Zhao, Yitian Ding, Chen Jia, Yining Wang, Zifan Qian
This paper investigates gender bias in large language models (LLMs) across multiple languages, focusing on three key measurements: 1) gender bias in selecting descriptive words given gender-related contexts, 2) gender bias in selecting gendered pronouns (she/he) given descriptive words, and 3) gender bias in the topics of LLM-generated dialogues. The study uses GPT series LLMs in six different languages—French, Spanish, Chinese, Japanese, and Korean—and employs a disparity impact (DI) score to evaluate the bias. The findings reveal significant gender biases in all languages examined, with notable differences in the extent and nature of these biases across languages. Specifically, the study finds that: 1. **Descriptive Word Selection**: LLMs tend to assign certain adjectives more frequently to males than females, particularly for "standout" and "personal quality" descriptions. 2. **Gendered Role Selection**: LLMs are more likely to predict male pronouns for "standout" and "personal quality" descriptions, while "outlook" descriptions are more likely to be assigned to females. 3. **Dialogue Topics**: Different gender pairs show distinct patterns in dialogue topics, with female-female dialogues focusing more on casual greetings and female-male dialogues on complaints and conflicts. The study highlights the importance of addressing gender bias in LLMs to ensure equitable and culturally sensitive applications, particularly in social-influencing domains such as education and technology. The findings also suggest that future research should explore gender bias in broader social contexts and address other forms of social disparities.This paper investigates gender bias in large language models (LLMs) across multiple languages, focusing on three key measurements: 1) gender bias in selecting descriptive words given gender-related contexts, 2) gender bias in selecting gendered pronouns (she/he) given descriptive words, and 3) gender bias in the topics of LLM-generated dialogues. The study uses GPT series LLMs in six different languages—French, Spanish, Chinese, Japanese, and Korean—and employs a disparity impact (DI) score to evaluate the bias. The findings reveal significant gender biases in all languages examined, with notable differences in the extent and nature of these biases across languages. Specifically, the study finds that: 1. **Descriptive Word Selection**: LLMs tend to assign certain adjectives more frequently to males than females, particularly for "standout" and "personal quality" descriptions. 2. **Gendered Role Selection**: LLMs are more likely to predict male pronouns for "standout" and "personal quality" descriptions, while "outlook" descriptions are more likely to be assigned to females. 3. **Dialogue Topics**: Different gender pairs show distinct patterns in dialogue topics, with female-female dialogues focusing more on casual greetings and female-male dialogues on complaints and conflicts. The study highlights the importance of addressing gender bias in LLMs to ensure equitable and culturally sensitive applications, particularly in social-influencing domains such as education and technology. The findings also suggest that future research should explore gender bias in broader social contexts and address other forms of social disparities.
Reach us at info@study.space
Understanding Gender Bias in Large Language Models across Multiple Languages