Reverse Training to Nurse the Reversal Curse

Reverse Training to Nurse the Reversal Curse

7 May 2024 | Olga Golovneva, Zeyuan Allen-Zhu, Jason Weston, Sainbayar Sukhbaatar
This paper introduces reverse training as a method to mitigate the reversal curse in large language models (LLMs). The reversal curse refers to the phenomenon where LLMs trained on "A has a feature B" fail to generalize to "B is a feature of A". Reverse training involves training models on both forward and reversed versions of the data, allowing the model to learn the reverse relationships. The paper proposes four types of reversal: token reversal, word reversal, entity-preserving reversal, and random segment reversal. These methods are tested on various tasks, including symbolic reverse tasks and reversing biographies. Results show that reverse training significantly improves performance on reversal tasks, particularly in cases where the data is predominantly seen in one direction. The paper also demonstrates that reverse training can be applied during pre-training and finetuning stages, with pre-training showing the most significant improvements. Reverse training helps LLMs learn the equivalence of relations like "A is the capital of B" equals "B’s capital is A", which is crucial for human-like reasoning. The paper concludes that reverse training is a valuable approach for improving LLMs' ability to handle reversal tasks and is particularly effective in data-bound scenarios.This paper introduces reverse training as a method to mitigate the reversal curse in large language models (LLMs). The reversal curse refers to the phenomenon where LLMs trained on "A has a feature B" fail to generalize to "B is a feature of A". Reverse training involves training models on both forward and reversed versions of the data, allowing the model to learn the reverse relationships. The paper proposes four types of reversal: token reversal, word reversal, entity-preserving reversal, and random segment reversal. These methods are tested on various tasks, including symbolic reverse tasks and reversing biographies. Results show that reverse training significantly improves performance on reversal tasks, particularly in cases where the data is predominantly seen in one direction. The paper also demonstrates that reverse training can be applied during pre-training and finetuning stages, with pre-training showing the most significant improvements. Reverse training helps LLMs learn the equivalence of relations like "A is the capital of B" equals "B’s capital is A", which is crucial for human-like reasoning. The paper concludes that reverse training is a valuable approach for improving LLMs' ability to handle reversal tasks and is particularly effective in data-bound scenarios.
Reach us at info@study.space
[slides and audio] Reverse Training to Nurse the Reversal Curse