[slides] Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought

The paper introduces Bias-Augmented Consistency Training (BCT), an unsupervised fine-tuning method designed to reduce biased reasoning in language models. BCT aims to improve the faithfulness of model explanations by training models to provide consistent reasoning across prompts with and without biasing features. The authors construct a suite of tests involving nine forms of biased reasoning on seven question-answering tasks. They find that applying BCT to GPT-3.5-Turbo with one bias reduces biased reasoning by 86% on held-out tasks and generalizes to other forms of bias, reducing biased reasoning on eight held-out biases by an average of 37%. The method does not require ground truth labels, making it promising for reducing biased reasoning from unknown biases and on tasks where ground truth reasoning supervision is unavailable. The paper also discusses the limitations of BCT and suggests future directions for improving its effectiveness and generalization.The paper introduces Bias-Augmented Consistency Training (BCT), an unsupervised fine-tuning method designed to reduce biased reasoning in language models. BCT aims to improve the faithfulness of model explanations by training models to provide consistent reasoning across prompts with and without biasing features. The authors construct a suite of tests involving nine forms of biased reasoning on seven question-answering tasks. They find that applying BCT to GPT-3.5-Turbo with one bias reduces biased reasoning by 86% on held-out tasks and generalizes to other forms of bias, reducing biased reasoning on eight held-out biases by an average of 37%. The method does not require ground truth labels, making it promising for reducing biased reasoning from unknown biases and on tasks where ground truth reasoning supervision is unavailable. The paper also discusses the limitations of BCT and suggests future directions for improving its effectiveness and generalization.

Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought

8 Mar 2024 | James Chua*, Edward Rees*, Hunar Batra, Samuel R. Bowman, Julian Michael, Ethan Perez, Miles Turpin†

8 Mar 2024 | James Chua, Edward Rees, Hunar Batra, Samuel R. Bowman, Julian Michael, Ethan Perez, Miles Turpin†