Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Learning to (Learn at Test Time): RNNs with Expressive Hidden States

2024-08-11 | Yu Sun*, Xinhao Li*, Karan Dalal*, Jiarui Xu, Arjun Vikram, Genghan Zhang, Yann Dubois, Xinlei Chen, Xiaolong Wang, Sanmi Koyejo, Tatsunori Hashimoto, Carlos Guestrin
The paper introduces a new class of sequence modeling layers called Test-Time Training (TTT) layers, which have linear complexity and expressive hidden states. The key idea is to make the hidden state a machine learning model itself and update it using self-supervised learning during test time. Two instantiations, TTT-Linear and TTT-MLP, are proposed, where the hidden state is a linear model and a two-layer MLP, respectively. Evaluations show that both TTT-Linear and TTT-MLP match or exceed the performance of strong baselines like Transformers and Mamba, a modern RNN. TTT-Linear is faster than Transformer at 8k context and matches Mamba in wall-clock time, while TTT-MLP shows larger potential in long context. The paper also discusses practical innovations to improve hardware efficiency, such as mini-batch TTT and a dual form for operations, making TTT-Linear a practical building block for LLMs.The paper introduces a new class of sequence modeling layers called Test-Time Training (TTT) layers, which have linear complexity and expressive hidden states. The key idea is to make the hidden state a machine learning model itself and update it using self-supervised learning during test time. Two instantiations, TTT-Linear and TTT-MLP, are proposed, where the hidden state is a linear model and a two-layer MLP, respectively. Evaluations show that both TTT-Linear and TTT-MLP match or exceed the performance of strong baselines like Transformers and Mamba, a modern RNN. TTT-Linear is faster than Transformer at 8k context and matches Mamba in wall-clock time, while TTT-MLP shows larger potential in long context. The paper also discusses practical innovations to improve hardware efficiency, such as mini-batch TTT and a dual form for operations, making TTT-Linear a practical building block for LLMs.
Reach us at info@study.space
[slides] Learning to (Learn at Test Time)%3A RNNs with Expressive Hidden States | StudySpace