Linearizing Large Language Models

Linearizing Large Language Models

10 May 2024 | Jean Mercat*, Kushal Arora, Achal Dave, Igor Vasiljevic*, Adrien Gaidon, Sedrick Keh*, Thomas Kollar
The paper introduces SUPRA (Scalable UPtraining for Recurrent Attention), a method to convert large pre-trained transformers into Recurrent Neural Networks (RNNs) with minimal additional training. This approach leverages the strong pre-training data and performance of existing transformer models while reducing inference costs by up to 95% compared to training RNNs from scratch. SUPRA replaces softmax attention with a linear kernel and normalization strategies, allowing the model to be trained in parallel as a transformer and used recurrently at inference time. The method is evaluated on standard benchmarks and long-context tasks, showing competitive performance but also highlighting limitations in in-context learning and long-context modeling. The authors discuss the need for more sophisticated recurrent state update rules and propose future directions for improving linear models. The code and models are available at <https://github.com/TRI-ML/linear_open_lm>.The paper introduces SUPRA (Scalable UPtraining for Recurrent Attention), a method to convert large pre-trained transformers into Recurrent Neural Networks (RNNs) with minimal additional training. This approach leverages the strong pre-training data and performance of existing transformer models while reducing inference costs by up to 95% compared to training RNNs from scratch. SUPRA replaces softmax attention with a linear kernel and normalization strategies, allowing the model to be trained in parallel as a transformer and used recurrently at inference time. The method is evaluated on standard benchmarks and long-context tasks, showing competitive performance but also highlighting limitations in in-context learning and long-context modeling. The authors discuss the need for more sophisticated recurrent state update rules and propose future directions for improving linear models. The code and models are available at <https://github.com/TRI-ML/linear_open_lm>.
Reach us at info@study.space