3 Jul 2024 | Opher Lieber*, Barak Lenz*, Hofit Bata, Gal Cohen, Jhonathan Osin, Itay Dalmedigos, Erez Safahi, Shaked Meir, Yonatan Belinkov, Shai Shalev-Shwartz, Omri Abend, Raz Alon, Tomer Asida, Amir Bergman, Roman Glozman, Michael Gokhman, Avshalom Manevich, Nir Ratner, Noam Rozen, Erez Schwartz, Mor Zusman, Yoav Shoham
Jamba is a novel hybrid language model combining Transformer and Mamba layers with a mixture-of-experts (MoE) module. It interleaves Transformer and Mamba layers to leverage the strengths of both architectures, enabling efficient training and handling of long contexts. The model is designed to fit in a single 80GB GPU, achieving high throughput and low memory usage while maintaining state-of-the-art performance on standard benchmarks and long-context evaluations. Jamba supports up to 256K tokens in context length, a significant improvement over existing models. The architecture allows for flexible configurations, balancing memory usage, training efficiency, and long-context capabilities. Jamba also incorporates MoE layers to increase model capacity without significantly increasing active parameters. The model was evaluated on various benchmarks, showing comparable performance to other large models, with better throughput. Jamba's hybrid architecture provides improved performance and efficiency, making it suitable for large-scale applications. The model is publicly available under an Apache 2.0 license, encouraging further research and exploration of the architecture.Jamba is a novel hybrid language model combining Transformer and Mamba layers with a mixture-of-experts (MoE) module. It interleaves Transformer and Mamba layers to leverage the strengths of both architectures, enabling efficient training and handling of long contexts. The model is designed to fit in a single 80GB GPU, achieving high throughput and low memory usage while maintaining state-of-the-art performance on standard benchmarks and long-context evaluations. Jamba supports up to 256K tokens in context length, a significant improvement over existing models. The architecture allows for flexible configurations, balancing memory usage, training efficiency, and long-context capabilities. Jamba also incorporates MoE layers to increase model capacity without significantly increasing active parameters. The model was evaluated on various benchmarks, showing comparable performance to other large models, with better throughput. Jamba's hybrid architecture provides improved performance and efficiency, making it suitable for large-scale applications. The model is publicly available under an Apache 2.0 license, encouraging further research and exploration of the architecture.