Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods

Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods

18 Apr 2018 | Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, Kai-Wei Chang
The paper introduces WinoBias, a new benchmark for coreference resolution focused on gender bias. The dataset contains Winograd-schema style sentences with entities corresponding to people referred by their occupation. The authors demonstrate that coreference systems, including rule-based, feature-rich, and neural systems, tend to link gendered pronouns to pro-stereotypical entities more accurately than anti-stereotypical ones, with an average F1 score difference of 21.1. They propose a data-augmentation approach that, when combined with word-embedding debiasing techniques, removes this bias without significantly affecting performance on existing benchmarks. The WinoBias dataset consists of 3,160 sentences, split equally for development and test, created by researchers familiar with the project. It includes two types of test cases that require linking gendered pronouns to either male or female stereotypical occupations. The authors analyze the training data used by coreference systems, Ontonotes 5.0, and find that female entities are significantly underrepresented. To reduce this bias, they propose generating an auxiliary dataset where all male entities are replaced by female entities and vice versa. This approach, combined with debiasing word embeddings, completely eliminates bias when evaluating on WinoBias without significantly affecting overall coreference accuracy. The authors evaluate three representative systems: a rule-based system, a feature-rich system, and an end-to-end neural system. All systems exhibit gender bias, with the rule-based system being the most biased. The authors show that their methods reduce this bias, allowing systems to pass the WinoBias test without significantly impacting performance on OntoNotes. They also demonstrate that systems can ignore gender bias when given sufficient alternative cues. The paper highlights two sources of gender bias in coreference systems: training data and auxiliary resources. They propose strategies to mitigate these biases, including gender swapping and debiasing word embeddings. The authors conclude that bias in NLP systems can amplify societal stereotypes, and they provide methods for detecting and reducing gender bias in coreference resolution.The paper introduces WinoBias, a new benchmark for coreference resolution focused on gender bias. The dataset contains Winograd-schema style sentences with entities corresponding to people referred by their occupation. The authors demonstrate that coreference systems, including rule-based, feature-rich, and neural systems, tend to link gendered pronouns to pro-stereotypical entities more accurately than anti-stereotypical ones, with an average F1 score difference of 21.1. They propose a data-augmentation approach that, when combined with word-embedding debiasing techniques, removes this bias without significantly affecting performance on existing benchmarks. The WinoBias dataset consists of 3,160 sentences, split equally for development and test, created by researchers familiar with the project. It includes two types of test cases that require linking gendered pronouns to either male or female stereotypical occupations. The authors analyze the training data used by coreference systems, Ontonotes 5.0, and find that female entities are significantly underrepresented. To reduce this bias, they propose generating an auxiliary dataset where all male entities are replaced by female entities and vice versa. This approach, combined with debiasing word embeddings, completely eliminates bias when evaluating on WinoBias without significantly affecting overall coreference accuracy. The authors evaluate three representative systems: a rule-based system, a feature-rich system, and an end-to-end neural system. All systems exhibit gender bias, with the rule-based system being the most biased. The authors show that their methods reduce this bias, allowing systems to pass the WinoBias test without significantly impacting performance on OntoNotes. They also demonstrate that systems can ignore gender bias when given sufficient alternative cues. The paper highlights two sources of gender bias in coreference systems: training data and auxiliary resources. They propose strategies to mitigate these biases, including gender swapping and debiasing word embeddings. The authors conclude that bias in NLP systems can amplify societal stereotypes, and they provide methods for detecting and reducing gender bias in coreference resolution.
Reach us at info@study.space
[slides] Gender Bias in Coreference Resolution%3A Evaluation and Debiasing Methods | StudySpace