13 Jun 2024 | Eve Fleisig*, Genevieve Smith*, Madeline Bossi*, Ishita Rustagi*, Xavier Yin*, Dan Klein
The study examines the linguistic bias exhibited by ChatGPT, specifically GPT-3.5 Turbo and GPT-4, when generating responses to ten varieties of English, including two standard varieties (Standard American English and Standard British English) and eight non-standard varieties from around the world. The research finds that both models default to "standard" varieties of English, with responses to non-standard varieties consistently exhibiting issues such as lack of comprehension, stereotyping, demeaning content, and condescending responses. GPT-4 shows improvements in comprehension, warmth, and friendliness compared to GPT-3.5, but it also exhibits a marked increase in stereotyping (+17%). The study suggests that these models can perpetuate linguistic discrimination, particularly against speakers of non-standard varieties, and highlights the need for better handling of non-standard language in language models to avoid exacerbating existing global inequities.The study examines the linguistic bias exhibited by ChatGPT, specifically GPT-3.5 Turbo and GPT-4, when generating responses to ten varieties of English, including two standard varieties (Standard American English and Standard British English) and eight non-standard varieties from around the world. The research finds that both models default to "standard" varieties of English, with responses to non-standard varieties consistently exhibiting issues such as lack of comprehension, stereotyping, demeaning content, and condescending responses. GPT-4 shows improvements in comprehension, warmth, and friendliness compared to GPT-3.5, but it also exhibits a marked increase in stereotyping (+17%). The study suggests that these models can perpetuate linguistic discrimination, particularly against speakers of non-standard varieties, and highlights the need for better handling of non-standard language in language models to avoid exacerbating existing global inequities.