29 Jun 2024 | Ruizhe Chen, Yichen Li, Yang Feng, Zuozhu Liu
This paper addresses the issue of bias in Large Language Models (LLMs) by proposing a novel debiasing method called Fairness Stamp (FAST). Existing debiasing methods often fail to preserve individual facts and knowledge, leading to unreasonable predictions. To overcome this, the authors establish a new benchmark, BiasKE, which evaluates debiasing performance using complementary metrics on fairness, specificity, and generalization. FAST enables fine-grained calibration of individual biased knowledge by identifying and modifying specific layers responsible for biased predictions. Comprehensive experiments on datasets like StereoSet and Crows-Pairs demonstrate that FAST outperforms state-of-the-art baselines in bias mitigation while maintaining model capability and knowledge preservation. The method is also scalable to larger models such as GPT-Neo and Llama, showing consistent effectiveness in real-world applications. The paper highlights the potential of fine-grained debiasing strategies for editable fairness in LLMs.This paper addresses the issue of bias in Large Language Models (LLMs) by proposing a novel debiasing method called Fairness Stamp (FAST). Existing debiasing methods often fail to preserve individual facts and knowledge, leading to unreasonable predictions. To overcome this, the authors establish a new benchmark, BiasKE, which evaluates debiasing performance using complementary metrics on fairness, specificity, and generalization. FAST enables fine-grained calibration of individual biased knowledge by identifying and modifying specific layers responsible for biased predictions. Comprehensive experiments on datasets like StereoSet and Crows-Pairs demonstrate that FAST outperforms state-of-the-art baselines in bias mitigation while maintaining model capability and knowledge preservation. The method is also scalable to larger models such as GPT-Neo and Llama, showing consistent effectiveness in real-world applications. The paper highlights the potential of fine-grained debiasing strategies for editable fairness in LLMs.