The paper introduces a new benchmark, WinoBias, designed to evaluate gender bias in coreference resolution systems. The benchmark consists of Winograd-schema style sentences with entities referred to by their occupations, focusing on gender stereotypes. The authors demonstrate that rule-based, feature-rich, and neural coreference systems all exhibit significant bias, linking gendered pronouns to pro-stereotypical entities more accurately than anti-stereotypical ones, with an average F1 score difference of 21.1. They propose a data-augmentation approach that combines with existing word-embedding debiasing techniques to remove this bias without significantly affecting performance on existing coreference datasets. The dataset and code are available at http://winobias.org. The paper also analyzes the training corpus, Ontonotes 5.0, and finds that it contains a significant underrepresentation of female entities, contributing to the bias. The authors propose a method to generate an auxiliary dataset by swapping male and female entities, which, when combined with debiasing techniques, effectively eliminates bias in WinoBias while maintaining coreference accuracy.The paper introduces a new benchmark, WinoBias, designed to evaluate gender bias in coreference resolution systems. The benchmark consists of Winograd-schema style sentences with entities referred to by their occupations, focusing on gender stereotypes. The authors demonstrate that rule-based, feature-rich, and neural coreference systems all exhibit significant bias, linking gendered pronouns to pro-stereotypical entities more accurately than anti-stereotypical ones, with an average F1 score difference of 21.1. They propose a data-augmentation approach that combines with existing word-embedding debiasing techniques to remove this bias without significantly affecting performance on existing coreference datasets. The dataset and code are available at http://winobias.org. The paper also analyzes the training corpus, Ontonotes 5.0, and finds that it contains a significant underrepresentation of female entities, contributing to the bias. The authors propose a method to generate an auxiliary dataset by swapping male and female entities, which, when combined with debiasing techniques, effectively eliminates bias in WinoBias while maintaining coreference accuracy.