This paper evaluates the performance of Mamba, a state space model, in the document ranking task, a classical information retrieval (IR) task. Mamba, which is based on state space models, has achieved transformer-equivalent performance in multiple sequence modeling tasks. The study compares Mamba models with transformer-based models in terms of their ability to understand long contextual inputs and capture interactions between query and document tokens. Key findings include:
1. **Performance**: Mamba models achieve competitive performance compared to transformer-based models with the same training recipe.
2. **Training Throughput**: However, Mamba models have lower training throughput compared to efficient transformer implementations like Flash Attention.
The paper also discusses the background of Mamba models, including the design choices and optimization techniques used to improve computational efficiency. The experiment setup and methodology are detailed, including the choice of backbone language models, training objectives, datasets, and hyperparameters. The results show that Mamba models can achieve high ranking performance, but their lower training throughput is a limitation.
The study concludes by highlighting the potential for further exploration of Mamba models in other IR tasks and the need for improvements in training efficiency. The code implementation and trained checkpoints are made public to facilitate reproducibility.This paper evaluates the performance of Mamba, a state space model, in the document ranking task, a classical information retrieval (IR) task. Mamba, which is based on state space models, has achieved transformer-equivalent performance in multiple sequence modeling tasks. The study compares Mamba models with transformer-based models in terms of their ability to understand long contextual inputs and capture interactions between query and document tokens. Key findings include:
1. **Performance**: Mamba models achieve competitive performance compared to transformer-based models with the same training recipe.
2. **Training Throughput**: However, Mamba models have lower training throughput compared to efficient transformer implementations like Flash Attention.
The paper also discusses the background of Mamba models, including the design choices and optimization techniques used to improve computational efficiency. The experiment setup and methodology are detailed, including the choice of backbone language models, training objectives, datasets, and hyperparameters. The results show that Mamba models can achieve high ranking performance, but their lower training throughput is a limitation.
The study concludes by highlighting the potential for further exploration of Mamba models in other IR tasks and the need for improvements in training efficiency. The code implementation and trained checkpoints are made public to facilitate reproducibility.