Understanding Semantics derived automatically from language corpora contain human-like biases

This paper demonstrates that machine learning models, specifically the GloVe word embedding method, can inherit human-like biases from ordinary language corpora. The authors replicate a range of standard human biases, such as those exposed by the Implicit Association Test (IAT), using a widely used statistical machine-learning model trained on web text. They find that language itself contains recoverable and accurate imprints of historical biases, including morally neutral biases towards insects or flowers, problematic biases towards race or gender, and veridical biases reflecting the status quo in gender distribution. The study introduces new methods for evaluating bias in text, such as the Word Embedding Association Test (WEAT) and the Word Embedding Factual Association Test (WEFAT). The implications of these findings are significant for AI, machine learning, psychology, sociology, and human ethics, suggesting that exposure to everyday language can account for the biases observed in these fields. The authors argue that prejudice must be addressed as a component of any intelligent system learning from human culture, and that it cannot be entirely eliminated but must be compensated for.This paper demonstrates that machine learning models, specifically the GloVe word embedding method, can inherit human-like biases from ordinary language corpora. The authors replicate a range of standard human biases, such as those exposed by the Implicit Association Test (IAT), using a widely used statistical machine-learning model trained on web text. They find that language itself contains recoverable and accurate imprints of historical biases, including morally neutral biases towards insects or flowers, problematic biases towards race or gender, and veridical biases reflecting the status quo in gender distribution. The study introduces new methods for evaluating bias in text, such as the Word Embedding Association Test (WEAT) and the Word Embedding Factual Association Test (WEFAT). The implications of these findings are significant for AI, machine learning, psychology, sociology, and human ethics, suggesting that exposure to everyday language can account for the biases observed in these fields. The authors argue that prejudice must be addressed as a component of any intelligent system learning from human culture, and that it cannot be entirely eliminated but must be compensated for.

Semantics derived automatically from language corpora necessarily contain human biases

2017 | Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan