2024 | Hanlin Zhu, Baihe Huang, Shaolun Zhang, Michael Jordan, Jiantao Jiao, Yuandong Tian, Stuart Russell
This paper investigates the "reversal curse" in auto-regressive large language models (LLMs), which refers to the phenomenon where LLMs trained on "A → B" fail to infer "B ← A" even though the two statements are semantically equivalent. The authors analyze this issue through the training dynamics of (stochastic) gradient descent for two auto-regressive models: a bilinear model and one-layer transformers. Their analysis reveals that the reversal curse arises from the asymmetry in model weights, where the increase in weights from token A to token B does not necessarily lead to an increase in weights from B to A. This asymmetry is caused by the training dynamics under the cross-entropy (CE) loss function and the optimization space of model parameters.
The study also shows that the reversal curse can be applied to other logical reasoning tasks, such as chain-of-thought (COT), which requires the model to generate intermediate reasoning steps. The authors demonstrate that without COT, models trained on "A → B" and "B → C" struggle to directly infer "A → C" even if it is logically valid. This highlights the importance of COT in enabling logical reasoning tasks.
The paper also validates these findings through experiments on multi-layer transformers, showing that the reversal curse persists even in these models. The results suggest that the asymmetry in model weights, caused by the CE loss, limits the ability of LLMs to automatically deduce indirect conclusions. This underscores the importance of in-context learning (ICL), data augmentation, and planning for LLMs to effectively solve complex reasoning tasks. The study provides a new theoretical perspective on the reversal curse and its implications for the design and training of LLMs.This paper investigates the "reversal curse" in auto-regressive large language models (LLMs), which refers to the phenomenon where LLMs trained on "A → B" fail to infer "B ← A" even though the two statements are semantically equivalent. The authors analyze this issue through the training dynamics of (stochastic) gradient descent for two auto-regressive models: a bilinear model and one-layer transformers. Their analysis reveals that the reversal curse arises from the asymmetry in model weights, where the increase in weights from token A to token B does not necessarily lead to an increase in weights from B to A. This asymmetry is caused by the training dynamics under the cross-entropy (CE) loss function and the optimization space of model parameters.
The study also shows that the reversal curse can be applied to other logical reasoning tasks, such as chain-of-thought (COT), which requires the model to generate intermediate reasoning steps. The authors demonstrate that without COT, models trained on "A → B" and "B → C" struggle to directly infer "A → C" even if it is logically valid. This highlights the importance of COT in enabling logical reasoning tasks.
The paper also validates these findings through experiments on multi-layer transformers, showing that the reversal curse persists even in these models. The results suggest that the asymmetry in model weights, caused by the CE loss, limits the ability of LLMs to automatically deduce indirect conclusions. This underscores the importance of in-context learning (ICL), data augmentation, and planning for LLMs to effectively solve complex reasoning tasks. The study provides a new theoretical perspective on the reversal curse and its implications for the design and training of LLMs.