MambaByte: Token-free Selective State Space Model

MambaByte: Token-free Selective State Space Model

2024 | Junxiong Wang, Tushaar Gangavarapu, Jing Nathan Yan, Alexander M. Rush
MambaByte is a token-free language model that directly processes raw bytes without subword tokenization, offering robustness to noise and efficient decoding. It is based on the Mamba state space model (SSM), which uses a fixed-sized memory state and efficient decoding. MambaByte outperforms state-of-the-art subword Transformers on language modeling tasks while maintaining the benefits of token-free models. It also achieves a 2.6× inference speedup through an adaptation of speculative decoding with tokenized drafting and byte-level verification. This approach allows MambaByte to match the decoding efficiency of subword Mamba models. The Mamba architecture uses selective state space models, where the hidden state evolves through a first-order differential equation. The model's hidden state is independent of context length, enabling efficient modeling of long byte sequences. MambaByte maintains a large fixed-sized memory state, making it suitable for direct byte-level modeling. It efficiently processes long sequences without requiring length-compression trade-offs. Experiments show that MambaByte outperforms MegaByte and other byte-level models in language modeling performance, with significantly less compute and training data. It also performs well in noise experiments, demonstrating improved robustness to subword noise. MambaByte achieves competitive performance with subword models and is more efficient in generation, with a decoding speed similar to subword Mamba models. The model's ability to extrapolate to significantly longer sequences than other byte-level models suggests its effectiveness in refining recurrent hidden states for long sequences. MambaByte's recurrent nature enables faster text generation compared to Transformers. The use of speculative decoding with subword drafting and byte-level verification further improves generation efficiency, making byte models practical for language modeling. These findings establish the viability of SSMs in enabling token-free language modeling.MambaByte is a token-free language model that directly processes raw bytes without subword tokenization, offering robustness to noise and efficient decoding. It is based on the Mamba state space model (SSM), which uses a fixed-sized memory state and efficient decoding. MambaByte outperforms state-of-the-art subword Transformers on language modeling tasks while maintaining the benefits of token-free models. It also achieves a 2.6× inference speedup through an adaptation of speculative decoding with tokenized drafting and byte-level verification. This approach allows MambaByte to match the decoding efficiency of subword Mamba models. The Mamba architecture uses selective state space models, where the hidden state evolves through a first-order differential equation. The model's hidden state is independent of context length, enabling efficient modeling of long byte sequences. MambaByte maintains a large fixed-sized memory state, making it suitable for direct byte-level modeling. It efficiently processes long sequences without requiring length-compression trade-offs. Experiments show that MambaByte outperforms MegaByte and other byte-level models in language modeling performance, with significantly less compute and training data. It also performs well in noise experiments, demonstrating improved robustness to subword noise. MambaByte achieves competitive performance with subword models and is more efficient in generation, with a decoding speed similar to subword Mamba models. The model's ability to extrapolate to significantly longer sequences than other byte-level models suggests its effectiveness in refining recurrent hidden states for long sequences. MambaByte's recurrent nature enables faster text generation compared to Transformers. The use of speculative decoding with subword drafting and byte-level verification further improves generation efficiency, making byte models practical for language modeling. These findings establish the viability of SSMs in enabling token-free language modeling.
Reach us at info@study.space