[slides and audio] RNNs are not Transformers (Yet)%3A The Key Bottleneck on In-context Retrieval

This paper investigates the gap in representation power between Recurrent Neural Networks (RNNs) and Transformers, particularly in solving algorithmic problems. The authors focus on whether RNNs, known for their memory efficiency in handling long sequences, can match the performance of Transformers, especially when enhanced with Chain-of-Thought (CoT) prompting. Theoretical analysis reveals that while CoT improves RNNs, it is insufficient to close the gap with Transformers. A key bottleneck lies in RNNs' inability to perfectly retrieve information from the context, even with CoT. For tasks that explicitly or implicitly require this capability, such as associative recall and determining if a graph is a tree, the paper proves that RNNs are not expressive enough to solve these tasks while Transformers can solve them easily. Conversely, the paper demonstrates that enhancing RNNs' in-context retrieval capability, including using Retrieval-Augmented Generation (RAG) and adding a single Transformer layer, can elevate RNNs to solve all polynomial-time solvable problems with CoT, thus closing the representation gap with Transformers. The main contributions include proving that CoT improves RNNs but cannot close the representation gap with Transformers, and showing that enhancing RNNs' in-context retrieval capability can close this gap.This paper investigates the gap in representation power between Recurrent Neural Networks (RNNs) and Transformers, particularly in solving algorithmic problems. The authors focus on whether RNNs, known for their memory efficiency in handling long sequences, can match the performance of Transformers, especially when enhanced with Chain-of-Thought (CoT) prompting. Theoretical analysis reveals that while CoT improves RNNs, it is insufficient to close the gap with Transformers. A key bottleneck lies in RNNs' inability to perfectly retrieve information from the context, even with CoT. For tasks that explicitly or implicitly require this capability, such as associative recall and determining if a graph is a tree, the paper proves that RNNs are not expressive enough to solve these tasks while Transformers can solve them easily. Conversely, the paper demonstrates that enhancing RNNs' in-context retrieval capability, including using Retrieval-Augmented Generation (RAG) and adding a single Transformer layer, can elevate RNNs to solve all polynomial-time solvable problems with CoT, thus closing the representation gap with Transformers. The main contributions include proving that CoT improves RNNs but cannot close the representation gap with Transformers, and showing that enhancing RNNs' in-context retrieval capability can close this gap.

RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval

10 May 2024 | Kaiyue Wen1, Xingyu Dang1, Kaifeng Lyu2†

RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval

10 May 2024 | Kaiyue Wen1*, Xingyu Dang1*, Kaifeng Lyu2†

10 May 2024 | Kaiyue Wen1, Xingyu Dang1, Kaifeng Lyu2†