Steering LLMs Towards Unbiased Responses: A Causality-Guided Debiasing Framework

Steering LLMs Towards Unbiased Responses: A Causality-Guided Debiasing Framework

13 Mar 2024 | Jingling Li, Zeyu Tang, Xiaoyu Liu, Peter Spirtes, Kun Zhang, Liu Leqi, Yang Liu
This paper addresses the issue of social bias in large language models (LLMs) by proposing a causality-guided debiasing framework. The authors focus on the association between demographic information and LLM outputs, aiming to mitigate biases in consequential decision-making processes such as hiring and healthcare. The framework leverages causal understandings of the data-generating process and the internal reasoning process of LLMs to design prompts that guide the selection of appropriate internal representations and knowledge. The proposed framework unifies existing prompting techniques and introduces new methods to encourage bias-free reasoning. Empirical results on real-world datasets demonstrate the effectiveness of the framework in reducing bias, even with limited access to the LLM's internal structure. The contributions of the paper include detailed causal modeling, the development of a principled debiasing framework, and strong empirical performance across various social biases.This paper addresses the issue of social bias in large language models (LLMs) by proposing a causality-guided debiasing framework. The authors focus on the association between demographic information and LLM outputs, aiming to mitigate biases in consequential decision-making processes such as hiring and healthcare. The framework leverages causal understandings of the data-generating process and the internal reasoning process of LLMs to design prompts that guide the selection of appropriate internal representations and knowledge. The proposed framework unifies existing prompting techniques and introduces new methods to encourage bias-free reasoning. Empirical results on real-world datasets demonstrate the effectiveness of the framework in reducing bias, even with limited access to the LLM's internal structure. The contributions of the paper include detailed causal modeling, the development of a principled debiasing framework, and strong empirical performance across various social biases.
Reach us at info@study.space