8 Feb 2024 | Beatrice Savoldi, Andrea Piergentili, Dennis Fucci, Matteo Negri, Luisa Bentivogli
This paper explores the potential of large language models (LLMs) to automate gender-neutral translation (GNT), comparing them with traditional machine translation (MT) systems. The study highlights the challenges of generating gender-neutral translations, particularly in languages with grammatical gender, where traditional MT systems struggle to avoid biased gender assumptions. The research focuses on English-to-Italian translation, using a parallel test set (GeNTE) designed to evaluate GNT capabilities. The study finds that while traditional MT systems produce only a small percentage of gender-neutral translations, GPT-4, when prompted with specific instructions and examples, demonstrates significant potential for generating gender-neutral outputs. The results show that GPT-4 can produce a high percentage of gender-neutral translations, even when given unseen examples, and that its translations are often more fluent and accurate than those of traditional MT systems. However, the study also notes that evaluating the quality and acceptability of GNT is a subjective task, with variations across annotators. The paper concludes that while GPT-4 shows promise in automating GNT, further research is needed to address the challenges of ensuring fairness and inclusivity in translation technologies. The authors make their manual annotations available for future research.This paper explores the potential of large language models (LLMs) to automate gender-neutral translation (GNT), comparing them with traditional machine translation (MT) systems. The study highlights the challenges of generating gender-neutral translations, particularly in languages with grammatical gender, where traditional MT systems struggle to avoid biased gender assumptions. The research focuses on English-to-Italian translation, using a parallel test set (GeNTE) designed to evaluate GNT capabilities. The study finds that while traditional MT systems produce only a small percentage of gender-neutral translations, GPT-4, when prompted with specific instructions and examples, demonstrates significant potential for generating gender-neutral outputs. The results show that GPT-4 can produce a high percentage of gender-neutral translations, even when given unseen examples, and that its translations are often more fluent and accurate than those of traditional MT systems. However, the study also notes that evaluating the quality and acceptability of GNT is a subjective task, with variations across annotators. The paper concludes that while GPT-4 shows promise in automating GNT, further research is needed to address the challenges of ensuring fairness and inclusivity in translation technologies. The authors make their manual annotations available for future research.