29 Jun 2024 | Ruizhe Chen, Yichen Li, Yang Feng, Zuozhu Liu
This paper introduces a novel approach to bias mitigation in large language models (LLMs) called Fairness Stamp (FAST), which focuses on fine-grained calibration of individual biased knowledge rather than broad group-level bias reduction. The authors propose a new benchmark called BiasKE, which systematically evaluates debiasing performance using metrics such as fairness, specificity, and generalization. FAST operates by identifying critical layers in LLMs responsible for biased predictions and inserting a lightweight modular network to adjust outputs while preserving knowledge. Comprehensive experiments on datasets like StereoSet and Crows-Pairs show that FAST outperforms existing methods in debiasing performance without compromising model capability. The method is scalable to larger models like GPT-Neo and Llama, and demonstrates effectiveness in reducing bias across various tasks. The paper also highlights the importance of measuring both bias reduction and knowledge retention in debiasing strategies, and proposes a framework that enables editable fairness in LLMs. The results show that FAST achieves significant improvements in bias mitigation while maintaining the model's ability to retain relevant knowledge. The study contributes to the field of LLM bias mitigation by providing a new benchmark and a novel method that enables fine-grained bias correction.This paper introduces a novel approach to bias mitigation in large language models (LLMs) called Fairness Stamp (FAST), which focuses on fine-grained calibration of individual biased knowledge rather than broad group-level bias reduction. The authors propose a new benchmark called BiasKE, which systematically evaluates debiasing performance using metrics such as fairness, specificity, and generalization. FAST operates by identifying critical layers in LLMs responsible for biased predictions and inserting a lightweight modular network to adjust outputs while preserving knowledge. Comprehensive experiments on datasets like StereoSet and Crows-Pairs show that FAST outperforms existing methods in debiasing performance without compromising model capability. The method is scalable to larger models like GPT-Neo and Llama, and demonstrates effectiveness in reducing bias across various tasks. The paper also highlights the importance of measuring both bias reduction and knowledge retention in debiasing strategies, and proposes a framework that enables editable fairness in LLMs. The results show that FAST achieves significant improvements in bias mitigation while maintaining the model's ability to retain relevant knowledge. The study contributes to the field of LLM bias mitigation by providing a new benchmark and a novel method that enables fine-grained bias correction.