18 Jul 2024 | Debjit Paul, Robert West, Antoine Bosselut, Boi Faltings
This paper investigates the faithfulness of chain-of-thought (CoT) reasoning in large language models (LLMs) and proposes FRODO, a framework to improve the reliability of reasoning steps and the final answers generated by LLMs. The authors perform a causal mediation analysis on twelve LLMs to examine how intermediate reasoning steps influence the final outcome. They find that LLMs do not reliably use their intermediate reasoning steps when generating answers. To address this issue, they introduce FRODO, which consists of an inference module that learns to generate correct reasoning steps using an implicit causal reward function and a reasoning module that learns to faithfully reason over these intermediate inferences using a counterfactual and causal preference objective. Experiments show that FRODO significantly outperforms four competitive baselines and improves the robustness and generalization ability of the reasoning LM, yielding higher performance on out-of-distribution test sets. The authors also find that FRODO's rationales are more faithful to its final answer predictions than standard supervised fine-tuning. The paper further explores the causal effects of CoT on the final answer across different tasks and models, highlighting the importance of model size and training methods in achieving faithful reasoning. FRODO is evaluated on four reasoning tasks and demonstrates a significant improvement in accuracy and robustness compared to existing methods. The study provides insights into the challenges of ensuring faithful reasoning in LLMs and offers a new approach to improve the reliability of reasoning steps and final answers.This paper investigates the faithfulness of chain-of-thought (CoT) reasoning in large language models (LLMs) and proposes FRODO, a framework to improve the reliability of reasoning steps and the final answers generated by LLMs. The authors perform a causal mediation analysis on twelve LLMs to examine how intermediate reasoning steps influence the final outcome. They find that LLMs do not reliably use their intermediate reasoning steps when generating answers. To address this issue, they introduce FRODO, which consists of an inference module that learns to generate correct reasoning steps using an implicit causal reward function and a reasoning module that learns to faithfully reason over these intermediate inferences using a counterfactual and causal preference objective. Experiments show that FRODO significantly outperforms four competitive baselines and improves the robustness and generalization ability of the reasoning LM, yielding higher performance on out-of-distribution test sets. The authors also find that FRODO's rationales are more faithful to its final answer predictions than standard supervised fine-tuning. The paper further explores the causal effects of CoT on the final answer across different tasks and models, highlighting the importance of model size and training methods in achieving faithful reasoning. FRODO is evaluated on four reasoning tasks and demonstrates a significant improvement in accuracy and robustness compared to existing methods. The study provides insights into the challenges of ensuring faithful reasoning in LLMs and offers a new approach to improve the reliability of reasoning steps and final answers.