Social Bias Evaluation for Large Language Models Requires Prompt Variations

Social Bias Evaluation for Large Language Models Requires Prompt Variations

3 Jul 2024 | Rem Hida, Masahiro Kaneko, Naoaki Okazaki
This paper investigates the sensitivity of Large Language Models (LLMs) to prompt variations in evaluating task performance and social bias. The authors use the BBQ dataset, which includes multiple-choice questions (MCQs) to assess LLMs' performance and bias. They analyze three prompt variation factors: task instruction and prompt, few-shot examples, and debias-prompt. The study reveals that LLMs are highly sensitive to prompts, leading to fluctuations in their rankings and performance scores. Additionally, the authors find trade-offs between task performance and social bias, with less bias from prompt settings potentially resulting in reduced performance. The ambiguity of instances is identified as a key factor contributing to this sensitivity. The paper recommends using diverse prompts to better understand the effects of prompts on social bias in LLMs. The findings highlight the importance of prompt variation in bias evaluation and suggest that evaluating multiple perspectives simultaneously is crucial for a comprehensive understanding of LLMs' abilities.This paper investigates the sensitivity of Large Language Models (LLMs) to prompt variations in evaluating task performance and social bias. The authors use the BBQ dataset, which includes multiple-choice questions (MCQs) to assess LLMs' performance and bias. They analyze three prompt variation factors: task instruction and prompt, few-shot examples, and debias-prompt. The study reveals that LLMs are highly sensitive to prompts, leading to fluctuations in their rankings and performance scores. Additionally, the authors find trade-offs between task performance and social bias, with less bias from prompt settings potentially resulting in reduced performance. The ambiguity of instances is identified as a key factor contributing to this sensitivity. The paper recommends using diverse prompts to better understand the effects of prompts on social bias in LLMs. The findings highlight the importance of prompt variation in bias evaluation and suggest that evaluating multiple perspectives simultaneously is crucial for a comprehensive understanding of LLMs' abilities.
Reach us at info@study.space