Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Learning to (Learn at Test Time): RNNs with Expressive Hidden States

11 Aug 2024 | Yu Sun*,1, Xinhao Li*,2, Karan Dalal*,3, Jiarui Xu2, Arjun Vikram1, Genghan Zhang1, Yann Dubois1, Xinlei Chen14, Xiaolong Wang12, Sanmi Koyejo14, Tatsunori Hashimoto11, Carlos Guestrin11
The paper introduces a new class of sequence modeling layers called Test-Time Training (TTT) layers, which have linear complexity and expressive hidden states. The key idea is to make the hidden state a machine learning model itself and update it using self-supervised learning during test time. Two instantiations, TTT-Linear and TTT-MLP, are proposed, where the hidden state is a linear model and a two-layer MLP, respectively. Evaluations show that both TTT-Linear and TTT-MLP match or exceed the performance of strong baselines like Transformers and Mamba, a modern RNN. TTT-Linear is faster than Transformer at 8k context and matches Mamba in wall-clock time, while TTT-MLP shows larger potential in long context. The paper also discusses practical innovations to improve hardware efficiency, such as mini-batch TTT and a dual form for operations, making TTT-Linear a practical building block for LLMs.The paper introduces a new class of sequence modeling layers called Test-Time Training (TTT) layers, which have linear complexity and expressive hidden states. The key idea is to make the hidden state a machine learning model itself and update it using self-supervised learning during test time. Two instantiations, TTT-Linear and TTT-MLP, are proposed, where the hidden state is a linear model and a two-layer MLP, respectively. Evaluations show that both TTT-Linear and TTT-MLP match or exceed the performance of strong baselines like Transformers and Mamba, a modern RNN. TTT-Linear is faster than Transformer at 8k context and matches Mamba in wall-clock time, while TTT-MLP shows larger potential in long context. The paper also discusses practical innovations to improve hardware efficiency, such as mini-batch TTT and a dual form for operations, making TTT-Linear a practical building block for LLMs.
Reach us at info@study.space
[slides] Learning to (Learn at Test Time)%3A RNNs with Expressive Hidden States | StudySpace