[slides and audio] MambaByte%3A Token-free Selective State Space Model

MambaByte is a token-free language model that directly processes raw bytes, avoiding the inductive bias of subword tokenization. Unlike standard autoregressive Transformers, which scale poorly with longer sequences, MambaByte leverages the fixed-sized memory state of the Mamba architecture, a state space model (SSM), to efficiently model byte sequences. This approach maintains robustness to noise and achieves competitive performance with subword Transformers while offering significant efficiency improvements. The paper introduces speculative decoding through subword drafting and byte-level verification, enabling MambaByte to achieve inference speeds comparable to subword models. Experiments on various long-form text datasets demonstrate MambaByte's superior performance and efficiency, making it a promising alternative to subword Transformers for language modeling.MambaByte is a token-free language model that directly processes raw bytes, avoiding the inductive bias of subword tokenization. Unlike standard autoregressive Transformers, which scale poorly with longer sequences, MambaByte leverages the fixed-sized memory state of the Mamba architecture, a state space model (SSM), to efficiently model byte sequences. This approach maintains robustness to noise and achieves competitive performance with subword Transformers while offering significant efficiency improvements. The paper introduces speculative decoding through subword drafting and byte-level verification, enabling MambaByte to achieve inference speeds comparable to subword models. Experiments on various long-form text datasets demonstrate MambaByte's superior performance and efficiency, making it a promising alternative to subword Transformers for language modeling.

MambaByte: Token-free Selective State Space Model

9 Aug 2024 | Junxiong Wang, Tushaar Gangavarapu, Jing Nathan Yan, Alexander M. Rush