DeciMamba: Exploring the Length Extrapolation Potential of Mamba

DeciMamba: Exploring the Length Extrapolation Potential of Mamba

20 Jun 2024 | Assaf Ben-Kish, Itamar Zimerman, Shady Abu-Hussein, Nadav Cohen, Amir Globerson, Lior Wolf, Raja Giryes
DeciMamba is a context-extension method designed to enhance the length-extrapolation capabilities of Mamba, a state-space model that achieves high performance with fewer resources than Transformers. Mamba's effective receptive field (ERF) is limited by the sequence length used during training, restricting its ability to process longer sequences. DeciMamba introduces a filtering mechanism within the S6 layer of Mamba, enabling the model to extrapolate to sequences 25 times longer than those seen during training without additional computational resources. This method leverages a dynamic data-dependent pooling strategy that discards less important tokens before each S6 layer, significantly increasing the effective context length of Mamba. Experiments on real-world long-range NLP tasks show that DeciMamba outperforms Mamba in extrapolation capabilities, achieving performance on sequences up to 25 times longer than the training sequences. The method is evaluated on tasks such as document retrieval and multi-document QA, demonstrating its effectiveness in handling long sequences. The results highlight the importance of addressing the ERF limitation in Mamba and show that DeciMamba significantly improves the model's ability to process long-range dependencies. The paper also discusses the limitations of Mamba and the potential for future improvements in length-extrapolation capabilities.DeciMamba is a context-extension method designed to enhance the length-extrapolation capabilities of Mamba, a state-space model that achieves high performance with fewer resources than Transformers. Mamba's effective receptive field (ERF) is limited by the sequence length used during training, restricting its ability to process longer sequences. DeciMamba introduces a filtering mechanism within the S6 layer of Mamba, enabling the model to extrapolate to sequences 25 times longer than those seen during training without additional computational resources. This method leverages a dynamic data-dependent pooling strategy that discards less important tokens before each S6 layer, significantly increasing the effective context length of Mamba. Experiments on real-world long-range NLP tasks show that DeciMamba outperforms Mamba in extrapolation capabilities, achieving performance on sequences up to 25 times longer than the training sequences. The method is evaluated on tasks such as document retrieval and multi-document QA, demonstrating its effectiveness in handling long sequences. The results highlight the importance of addressing the ERF limitation in Mamba and show that DeciMamba significantly improves the model's ability to process long-range dependencies. The paper also discusses the limitations of Mamba and the potential for future improvements in length-extrapolation capabilities.
Reach us at info@study.space
[slides and audio] DeciMamba%3A Exploring the Length Extrapolation Potential of Mamba