6 Jun 2024 | Zhaoyi Li1,2, Gangwei Jiang1,2, Hong Xie1, Linqi Song2,3*, Defu Lian1*, Ying Wei4*
This paper explores the compositional reasoning capabilities of Large Language Models (LLMs) and identifies the root causes of their failures in handling complex tasks. The authors find that these failures often stem from the improper generation or utilization of implicit reasoning results. Through empirical analysis and intervention experiments, they discover that implicit reasoning results are present in middle layers of LLMs and play a crucial role in shaping explicit reasoning outcomes. Specifically, Multi-Head Self-Attention (MHSA) modules in these layers are identified as key components responsible for generating and leveraging implicit reasoning results. Based on these findings, the authors propose CREME (Correcting Compositional REasoning via Model Editing), a lightweight method to correct compositional reasoning errors by editing MHSA modules. Empirical results demonstrate that CREME effectively improves LLMs' compositional reasoning capabilities, both on the query used for editing and on paraphrased and related queries, while maintaining low impact on irrelevant queries. The paper also discusses the limitations and ethical considerations of the proposed approach.This paper explores the compositional reasoning capabilities of Large Language Models (LLMs) and identifies the root causes of their failures in handling complex tasks. The authors find that these failures often stem from the improper generation or utilization of implicit reasoning results. Through empirical analysis and intervention experiments, they discover that implicit reasoning results are present in middle layers of LLMs and play a crucial role in shaping explicit reasoning outcomes. Specifically, Multi-Head Self-Attention (MHSA) modules in these layers are identified as key components responsible for generating and leveraging implicit reasoning results. Based on these findings, the authors propose CREME (Correcting Compositional REasoning via Model Editing), a lightweight method to correct compositional reasoning errors by editing MHSA modules. Empirical results demonstrate that CREME effectively improves LLMs' compositional reasoning capabilities, both on the query used for editing and on paraphrased and related queries, while maintaining low impact on irrelevant queries. The paper also discusses the limitations and ethical considerations of the proposed approach.