BiasAlert is a plug-and-play tool designed to detect social bias in open-text generation by LLMs. It integrates external human knowledge with the inherent reasoning capabilities of LLMs to reliably identify bias. Extensive experiments show that BiasAlert significantly outperforms existing methods like GPT4-as-A-Judge in detecting bias. Application studies demonstrate its utility in bias evaluation and mitigation across various scenarios. The tool is publicly released.
Large Language Models (LLMs) have shown social bias due to their training data, which affects fairness and reliability. Existing bias evaluation methods rely on fixed-form inputs and outputs, which are not suitable for flexible open-text generation. BiasAlert addresses this by using a social bias retrieval database and instruction-following paradigm to enhance reasoning. It evaluates bias by analyzing generated text and provides judgments with explanations.
BiasAlert uses a social bias retrieval database built from human-annotated data and an instruction-following dataset to improve internal reasoning. It evaluates performance on datasets like RedditBias and Crows-pairs, outperforming existing tools and LLMs in bias detection. Ablation studies show that retrieval and step-by-step instructions are crucial for effective bias detection.
BiasAlert is applied to evaluate bias in LLMs on text completion and question-answering tasks. It detects bias in responses from 9 LLMs, showing that some models like OPT-6.7b and GPT-3.5 have minimal bias, while others like Llama-2-13b-chat show higher bias. Human validation consistency is over 92%, confirming BiasAlert's effectiveness.
BiasAlert is also used for bias mitigation during LLM deployment. It audits text generation and terminates when bias is detected. Results show that deploying BiasAlert reduces biased generation, proving its effectiveness in bias mitigation. It takes an average of 1.4 seconds per generation, making it feasible for real-world deployment.
The paper concludes that BiasAlert is a reliable tool for bias detection in open-text generation, emphasizing the need for external knowledge. It highlights the importance of bias evaluation and mitigation in LLM deployment. Limitations include the use of simulated datasets and outdated retrieval databases. Future work includes integrating BiasAlert with new datasets and improving retrieval methods.
Potential risks are minimal, as all annotators were informed about the study's purpose and potential offensive content. Anonymity and ethical training were ensured. BiasAlert promotes fairness in AI by detecting and mitigating bias in LLMs.BiasAlert is a plug-and-play tool designed to detect social bias in open-text generation by LLMs. It integrates external human knowledge with the inherent reasoning capabilities of LLMs to reliably identify bias. Extensive experiments show that BiasAlert significantly outperforms existing methods like GPT4-as-A-Judge in detecting bias. Application studies demonstrate its utility in bias evaluation and mitigation across various scenarios. The tool is publicly released.
Large Language Models (LLMs) have shown social bias due to their training data, which affects fairness and reliability. Existing bias evaluation methods rely on fixed-form inputs and outputs, which are not suitable for flexible open-text generation. BiasAlert addresses this by using a social bias retrieval database and instruction-following paradigm to enhance reasoning. It evaluates bias by analyzing generated text and provides judgments with explanations.
BiasAlert uses a social bias retrieval database built from human-annotated data and an instruction-following dataset to improve internal reasoning. It evaluates performance on datasets like RedditBias and Crows-pairs, outperforming existing tools and LLMs in bias detection. Ablation studies show that retrieval and step-by-step instructions are crucial for effective bias detection.
BiasAlert is applied to evaluate bias in LLMs on text completion and question-answering tasks. It detects bias in responses from 9 LLMs, showing that some models like OPT-6.7b and GPT-3.5 have minimal bias, while others like Llama-2-13b-chat show higher bias. Human validation consistency is over 92%, confirming BiasAlert's effectiveness.
BiasAlert is also used for bias mitigation during LLM deployment. It audits text generation and terminates when bias is detected. Results show that deploying BiasAlert reduces biased generation, proving its effectiveness in bias mitigation. It takes an average of 1.4 seconds per generation, making it feasible for real-world deployment.
The paper concludes that BiasAlert is a reliable tool for bias detection in open-text generation, emphasizing the need for external knowledge. It highlights the importance of bias evaluation and mitigation in LLM deployment. Limitations include the use of simulated datasets and outdated retrieval databases. Future work includes integrating BiasAlert with new datasets and improving retrieval methods.
Potential risks are minimal, as all annotators were informed about the study's purpose and potential offensive content. Anonymity and ethical training were ensured. BiasAlert promotes fairness in AI by detecting and mitigating bias in LLMs.