Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought

Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought

8 Mar 2024 | James Chua*, Edward Rees*, Hunar Batra, Samuel R. Bowman, Julian Michael, Ethan Perez, Miles Turpin+
Bias-Augmented Consistency Training (BCT) reduces biased reasoning in chain-of-thought (CoT) reasoning by training models to generate consistent reasoning across prompts with and without biasing features. The method involves generating unbiased CoT reasoning for a question, then creating a biased prompt by adding a bias toward a random answer choice. Supervised fine-tuning is then performed on this dataset of biased prompts and unbiased CoT reasoning. This approach reduces the susceptibility of models to unverbalized biasing features, thereby reducing biased reasoning. BCT is tested on nine forms of biased reasoning across seven question-answering tasks and shows significant reductions in biased reasoning, particularly on held-out tasks. BCT generalizes to other forms of bias, reducing biased reasoning on held-out biases by an average of 37%. The method does not require gold labels and can reduce biased reasoning even for unknown biases. BCT also reduces coherent biased reasoning, where models produce internally consistent reasoning that supports the final answer. Analysis shows that BCT improves model performance and reduces the incidence of coherent biased reasoning without labels. BCT is effective in reducing biased reasoning across various tasks and biases, and its unsupervised nature makes it promising for improving the faithfulness of model reasoning. The method is evaluated on multiple datasets and shows significant improvements in reducing biased reasoning. BCT is a promising approach for improving the faithfulness of model reasoning and reducing biased reasoning in AI systems.Bias-Augmented Consistency Training (BCT) reduces biased reasoning in chain-of-thought (CoT) reasoning by training models to generate consistent reasoning across prompts with and without biasing features. The method involves generating unbiased CoT reasoning for a question, then creating a biased prompt by adding a bias toward a random answer choice. Supervised fine-tuning is then performed on this dataset of biased prompts and unbiased CoT reasoning. This approach reduces the susceptibility of models to unverbalized biasing features, thereby reducing biased reasoning. BCT is tested on nine forms of biased reasoning across seven question-answering tasks and shows significant reductions in biased reasoning, particularly on held-out tasks. BCT generalizes to other forms of bias, reducing biased reasoning on held-out biases by an average of 37%. The method does not require gold labels and can reduce biased reasoning even for unknown biases. BCT also reduces coherent biased reasoning, where models produce internally consistent reasoning that supports the final answer. Analysis shows that BCT improves model performance and reduces the incidence of coherent biased reasoning without labels. BCT is effective in reducing biased reasoning across various tasks and biases, and its unsupervised nature makes it promising for improving the faithfulness of model reasoning. The method is evaluated on multiple datasets and shows significant improvements in reducing biased reasoning. BCT is a promising approach for improving the faithfulness of model reasoning and reducing biased reasoning in AI systems.
Reach us at info@study.space
[slides] Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought | StudySpace