RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval

RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval

10 May 2024 | Kaiyue Wen, Xingyu Dang, Kaifeng Lyu
RNNs and Transformers differ significantly in their representation power, especially in in-context retrieval tasks. This paper investigates whether RNNs, known for their memory efficiency, can match Transformers' performance, particularly with Chain-of-Thought (CoT) prompting. Theoretical analysis shows that while CoT improves RNNs, it is insufficient to close the gap with Transformers. A key bottleneck is RNNs' inability to perfectly retrieve information from the context, even with CoT. For tasks like associative recall and determining if a graph is a tree, RNNs are not expressive enough, while Transformers can solve them easily. However, enhancing RNNs with techniques like Retrieval-Augmented Generation (RAG) or adding a single Transformer layer can close this gap, allowing RNNs to solve all polynomial-time solvable problems with CoT. The paper demonstrates that in-context retrieval is the root cause of the gap and proposes two forms of In-context RAG to address it. Theoretical results show that RNNs with enhanced in-context retrieval can match Transformers' performance. The study also highlights that while Transformers are more expressive, RNNs can be made competitive with appropriate enhancements. The findings suggest that RNNs, when augmented with retrieval capabilities, can achieve similar performance to Transformers in solving algorithmic problems.RNNs and Transformers differ significantly in their representation power, especially in in-context retrieval tasks. This paper investigates whether RNNs, known for their memory efficiency, can match Transformers' performance, particularly with Chain-of-Thought (CoT) prompting. Theoretical analysis shows that while CoT improves RNNs, it is insufficient to close the gap with Transformers. A key bottleneck is RNNs' inability to perfectly retrieve information from the context, even with CoT. For tasks like associative recall and determining if a graph is a tree, RNNs are not expressive enough, while Transformers can solve them easily. However, enhancing RNNs with techniques like Retrieval-Augmented Generation (RAG) or adding a single Transformer layer can close this gap, allowing RNNs to solve all polynomial-time solvable problems with CoT. The paper demonstrates that in-context retrieval is the root cause of the gap and proposes two forms of In-context RAG to address it. Theoretical results show that RNNs with enhanced in-context retrieval can match Transformers' performance. The study also highlights that while Transformers are more expressive, RNNs can be made competitive with appropriate enhancements. The findings suggest that RNNs, when augmented with retrieval capabilities, can achieve similar performance to Transformers in solving algorithmic problems.
Reach us at info@study.space