Gender Bias in Large Language Models across Multiple Languages

Gender Bias in Large Language Models across Multiple Languages

1 Mar 2024 | Jinman Zhao, Yitian Ding, Chen Jia, Yining Wang, Zifan Qian
This paper investigates gender bias in large language models (LLMs) across multiple languages. The authors propose three quantitative measurements to evaluate gender bias in LLM-generated outputs: 1) bias in selecting descriptive words given gender-related context, 2) bias in selecting gender-related pronouns (she/he) given descriptive words, and 3) bias in the topics of LLM-generated dialogues. They evaluate these measurements using the GPT series of LLMs in six languages: English, French, Spanish, Chinese, Japanese, and Korean. The findings reveal significant gender biases across all languages examined. The first measurement, bias in descriptive word selection, evaluates the conditional generation probability of certain lexicons given the gender of the person being described. The second measurement, bias in gendered role selection, evaluates the conditional generation probability of pronouns given descriptive words. The third measurement, bias in dialogue topics, evaluates the sentiment tendency reflected by the topics of LLM-generated dialogues given the gender-pair of the speakers. The results show that gender bias is present in the co-occurrence probability between certain descriptive words and genders, in the prediction of gender roles given a certain type of personal description, and in the divergence of the underlying sentiment tendency reflected by the dialogue topics between different gender pairs. These findings reveal gender bias in LLM generations from different aspects and highlight the need for future work to de-bias LLM-generated text containing gender information. The study also highlights the importance of considering the diverse language backgrounds of LLM users and the strong capabilities in multilingual reasoning of the LLMs themselves. The authors emphasize the various language features and cultural influences that affect how gender bias occurs in different languages. Different languages may have different degrees of gender bias in LLM generations, and understanding this is essential for acknowledging and mitigating these biases in LLMs, ensuring they are more equitable and culturally aware in the wide range of applications.This paper investigates gender bias in large language models (LLMs) across multiple languages. The authors propose three quantitative measurements to evaluate gender bias in LLM-generated outputs: 1) bias in selecting descriptive words given gender-related context, 2) bias in selecting gender-related pronouns (she/he) given descriptive words, and 3) bias in the topics of LLM-generated dialogues. They evaluate these measurements using the GPT series of LLMs in six languages: English, French, Spanish, Chinese, Japanese, and Korean. The findings reveal significant gender biases across all languages examined. The first measurement, bias in descriptive word selection, evaluates the conditional generation probability of certain lexicons given the gender of the person being described. The second measurement, bias in gendered role selection, evaluates the conditional generation probability of pronouns given descriptive words. The third measurement, bias in dialogue topics, evaluates the sentiment tendency reflected by the topics of LLM-generated dialogues given the gender-pair of the speakers. The results show that gender bias is present in the co-occurrence probability between certain descriptive words and genders, in the prediction of gender roles given a certain type of personal description, and in the divergence of the underlying sentiment tendency reflected by the dialogue topics between different gender pairs. These findings reveal gender bias in LLM generations from different aspects and highlight the need for future work to de-bias LLM-generated text containing gender information. The study also highlights the importance of considering the diverse language backgrounds of LLM users and the strong capabilities in multilingual reasoning of the LLMs themselves. The authors emphasize the various language features and cultural influences that affect how gender bias occurs in different languages. Different languages may have different degrees of gender bias in LLM generations, and understanding this is essential for acknowledging and mitigating these biases in LLMs, ensuring they are more equitable and culturally aware in the wide range of applications.
Reach us at info@study.space