August 1-6, 2021 | Moin Nadeem, Anna Bethke, Siva Reddy
StereoSet is a large-scale natural English dataset designed to measure stereotypical biases in four domains: gender, profession, race, and religion. The dataset includes 321 target terms and 16,995 test instances (triplets). The authors propose Context Association Tests (CATs) to evaluate both stereotypical bias and language modeling ability of pretrained language models. CATs involve providing a target term with a natural context and three possible associative contexts, and measuring which association is more likely based on the model's predictions. The authors contrast the stereotypical bias and language modeling ability of popular models like BERT, GPT2, RoBERTA, and XLNET. They show that these models exhibit strong stereotypical biases. The dataset and code are available at https://stereoset.mit.edu. The study highlights the importance of measuring stereotypical biases in pretrained language models, as they can reflect real-world stereotypes. The authors also discuss the limitations of previous methods for evaluating bias in pretrained models and propose a new approach that considers both language modeling ability and stereotypical bias. The results show that language modeling ability is correlated with stereotypical bias, and that models with higher language modeling ability also exhibit higher stereotypical bias. The study concludes that achieving unbiased language models requires breaking this correlation. The authors also note that achieving ideal performance on StereoSet does not guarantee that a model is unbiased, as bias can manifest in many ways. The study provides a comprehensive evaluation of pretrained language models in terms of their stereotypical biases and language modeling ability.StereoSet is a large-scale natural English dataset designed to measure stereotypical biases in four domains: gender, profession, race, and religion. The dataset includes 321 target terms and 16,995 test instances (triplets). The authors propose Context Association Tests (CATs) to evaluate both stereotypical bias and language modeling ability of pretrained language models. CATs involve providing a target term with a natural context and three possible associative contexts, and measuring which association is more likely based on the model's predictions. The authors contrast the stereotypical bias and language modeling ability of popular models like BERT, GPT2, RoBERTA, and XLNET. They show that these models exhibit strong stereotypical biases. The dataset and code are available at https://stereoset.mit.edu. The study highlights the importance of measuring stereotypical biases in pretrained language models, as they can reflect real-world stereotypes. The authors also discuss the limitations of previous methods for evaluating bias in pretrained models and propose a new approach that considers both language modeling ability and stereotypical bias. The results show that language modeling ability is correlated with stereotypical bias, and that models with higher language modeling ability also exhibit higher stereotypical bias. The study concludes that achieving unbiased language models requires breaking this correlation. The authors also note that achieving ideal performance on StereoSet does not guarantee that a model is unbiased, as bias can manifest in many ways. The study provides a comprehensive evaluation of pretrained language models in terms of their stereotypical biases and language modeling ability.