3 Jul 2024 | Jared Moore, Tanvi Deshpande, Diyi Yang
The paper investigates whether large language models (LLMs) are consistent in expressing values across various questions, paraphrases, use-cases, and translations. The authors define value consistency as the similarity of answers across different forms of questions and topics. They use a novel dataset, VALUECONSISTENCY, which includes over 8,000 questions spanning 300 topics and four languages (English, Chinese, German, and Japanese). The study compares the consistency of base models and fine-tuned models, finding that large models are relatively consistent, performing similarly or better than human participants on topic and paraphrase consistency. However, models show inconsistencies, especially on controversial topics like euthanasia compared to uncontroversial topics like women's rights. Base models are more consistent than fine-tuned models, and fine-tuned models exhibit varying levels of consistency across different topics. The study also examines the steerableability of models to specific values, finding that models are not easily steered to align with Schwartz's values. The findings suggest that while LLMs are generally consistent, their consistency varies, and further research is needed to understand the underlying mechanisms and potential biases.The paper investigates whether large language models (LLMs) are consistent in expressing values across various questions, paraphrases, use-cases, and translations. The authors define value consistency as the similarity of answers across different forms of questions and topics. They use a novel dataset, VALUECONSISTENCY, which includes over 8,000 questions spanning 300 topics and four languages (English, Chinese, German, and Japanese). The study compares the consistency of base models and fine-tuned models, finding that large models are relatively consistent, performing similarly or better than human participants on topic and paraphrase consistency. However, models show inconsistencies, especially on controversial topics like euthanasia compared to uncontroversial topics like women's rights. Base models are more consistent than fine-tuned models, and fine-tuned models exhibit varying levels of consistency across different topics. The study also examines the steerableability of models to specific values, finding that models are not easily steered to align with Schwartz's values. The findings suggest that while LLMs are generally consistent, their consistency varies, and further research is needed to understand the underlying mechanisms and potential biases.