Measuring Implicit Bias in Explicitly Unbiased Large Language Models

Measuring Implicit Bias in Explicitly Unbiased Large Language Models

23 May 2024 | Xuechunzi Bai, Angelina Wang, Ilia Sucholutsky, Thomas L. Griffiths
This paper introduces two new methods for measuring implicit bias in large language models (LLMs): LLM Implicit Bias and LLM Decision Bias. These methods are inspired by psychological research on implicit bias and are designed to detect subtle, often unconscious biases that may not be captured by traditional bias measures. LLM Implicit Bias is based on the Implicit Association Test (IAT), a psychological method for measuring automatic associations between concepts. LLM Decision Bias is based on the idea that relative evaluations between two candidates are more diagnostic of implicit biases than absolute evaluations of each candidate independently. The authors tested these methods on eight value-aligned LLMs across 21 stereotypes in four social categories (race, gender, religion, health). They found pervasive stereotype biases in these models, mirroring societal biases. For example, GPT-4 was more likely to recommend candidates with African, Asian, Hispanic, and Arabic names for clerical work and candidates with Caucasian names for supervisor positions. It also suggested women study humanities while men study science and invited Jewish friends to religious service but Christian friends to a party. The results show that implicit biases can be indicators of discriminatory behaviors, even in models that appear unbiased according to standard benchmarks. The authors argue that these new prompt-based measures draw from psychology's long history of research into measuring stereotype biases based on observable behavior. They highlight the importance of measuring implicit bias in LLMs, as it can reveal nuanced biases that may not be apparent through traditional methods. The study also compares the effectiveness of LLM Implicit Bias and embedding-based bias measures in predicting downstream behaviors. It finds that LLM Implicit Bias is more strongly correlated with decision bias than embedding-based bias. The authors conclude that implicit bias is related to but distinct from embedding-based bias, with the former being more correlated with decision bias. Relative, not absolute, decisions are the most diagnostic of these implicit biases. The study underscores the importance of measuring implicit bias in LLMs to understand and address potential discriminatory behaviors.This paper introduces two new methods for measuring implicit bias in large language models (LLMs): LLM Implicit Bias and LLM Decision Bias. These methods are inspired by psychological research on implicit bias and are designed to detect subtle, often unconscious biases that may not be captured by traditional bias measures. LLM Implicit Bias is based on the Implicit Association Test (IAT), a psychological method for measuring automatic associations between concepts. LLM Decision Bias is based on the idea that relative evaluations between two candidates are more diagnostic of implicit biases than absolute evaluations of each candidate independently. The authors tested these methods on eight value-aligned LLMs across 21 stereotypes in four social categories (race, gender, religion, health). They found pervasive stereotype biases in these models, mirroring societal biases. For example, GPT-4 was more likely to recommend candidates with African, Asian, Hispanic, and Arabic names for clerical work and candidates with Caucasian names for supervisor positions. It also suggested women study humanities while men study science and invited Jewish friends to religious service but Christian friends to a party. The results show that implicit biases can be indicators of discriminatory behaviors, even in models that appear unbiased according to standard benchmarks. The authors argue that these new prompt-based measures draw from psychology's long history of research into measuring stereotype biases based on observable behavior. They highlight the importance of measuring implicit bias in LLMs, as it can reveal nuanced biases that may not be apparent through traditional methods. The study also compares the effectiveness of LLM Implicit Bias and embedding-based bias measures in predicting downstream behaviors. It finds that LLM Implicit Bias is more strongly correlated with decision bias than embedding-based bias. The authors conclude that implicit bias is related to but distinct from embedding-based bias, with the former being more correlated with decision bias. Relative, not absolute, decisions are the most diagnostic of these implicit biases. The study underscores the importance of measuring implicit bias in LLMs to understand and address potential discriminatory behaviors.
Reach us at info@study.space
Understanding Measuring Implicit Bias in Explicitly Unbiased Large Language Models