[slides and audio] Zamba%3A A Compact 7B SSM Hybrid Model

Zamba is a novel 7B SSM (State-Space Model) hybrid model that combines a Mamba backbone with a single shared attention module. Trained on 1T tokens from openly available datasets, Zamba achieves competitive performance against leading open-weight models at a comparable scale. The unique architecture of Zamba reduces the parameter cost of attention while maintaining the benefits of attention. This results in faster inference and significantly less memory usage for generating long sequences compared to comparable transformer models. Zamba is trained in two phases: the first phase uses existing web datasets, and the second phase involves annealing the model over high-quality instruct and synthetic datasets with a rapid learning rate decay. The authors open-source the weights and all checkpoints for both phases, making it the most performant open-checkpoints SSM model available. Zamba's architecture and training approach demonstrate the scalability of SSMs and the potential of combining SSMs with attention for improved performance and efficiency.Zamba is a novel 7B SSM (State-Space Model) hybrid model that combines a Mamba backbone with a single shared attention module. Trained on 1T tokens from openly available datasets, Zamba achieves competitive performance against leading open-weight models at a comparable scale. The unique architecture of Zamba reduces the parameter cost of attention while maintaining the benefits of attention. This results in faster inference and significantly less memory usage for generating long sequences compared to comparable transformer models. Zamba is trained in two phases: the first phase uses existing web datasets, and the second phase involves annealing the model over high-quality instruct and synthetic datasets with a rapid learning rate decay. The authors open-source the weights and all checkpoints for both phases, making it the most performant open-checkpoints SSM model available. Zamba's architecture and training approach demonstrate the scalability of SSMs and the potential of combining SSMs with attention for improved performance and efficiency.

Zamba: A Compact 7B SSM Hybrid Model

26 May 2024 | Paolo Glorioso, Quentin Anthony, Yury Tokpanov, James Whittington, Jonathan Pilault, Adam Ibrahim, Beren Millidge