[slides and audio] Mamba-ND%3A Selective State Space Modeling for Multi-Dimensional Data

Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data **Abstract:** Transformers have become the standard architecture for sequence modeling, but their self-attention layers scale quadratically with sequence length. Mamba, a state space model (SSM), achieves comparable performance to Transformers while scaling linearly. This work presents Mamba-ND, which extends Mamba to arbitrary multi-dimensional data. Mamba-ND alternately processes the input data across different dimensions in row-major orderings. Extensive experiments show that Mamba-ND outperforms Transformers on various multi-dimensional benchmarks, including ImageNet-1K, HMDB-51, UCF-101, ERA5, and BTCV, with significantly fewer parameters and subquadratic complexity. **Keywords:** State Space Models · Multi-Dimensional Modeling - **Introduction:** - **Background on SSMs:** SSMs model input data using ordinary differential equations (ODEs) and have shown superior performance on long sequences. - **Mamba Layer:** Mamba layers consist of a 1D convolution, an SSM kernel, and a residual connection. - **Methodology:** Various approaches to adapt Mamba to multidimensional data are explored, including layer-level and block-level designs. - **Scan Orderings:** - **Definition:** Scan orderings are permutations of the axes of the input data flattened into a 1D sequence. - **Examples:** For 2D data, there are four possible orderings: $(HW) +$, $(HW) -$, $(WH) +$, and $(WH) -$. - **3D Data:** There are 12 possible orderings, such as $(HWT) +$ and $(WHT) -$. - **Adapting the Mamba Layer:** - **Bi-SSM Layer:** Passes the output of the convolution layer to two independent SSM kernels. - **ND-SSM Layer:** Extends Bi-SSM by incorporating additional SSMs for different orderings. - **Multi-head SSM Layer:** Splits the input sequence into multiple heads, each processed by a separate SSM kernel. - **Arranging Mamba Layers:** - **Alternating-Directional:** Changes the direction of SSM in each layer alternately. - **Bi-Directional:** Processes the input in opposite directions in each layer. - **Quad-Directional:** Further groups the directions to improve performance. - **Scan Factorization:** - **Purpose:** To mitigate the quadratic complexity of Transformers, various ways to factorize the SSM scan into smaller scans are explored. - **Experiments:** - **Datasets and Setups:** ImageNet-1K, HMDB-51, UCF-101, ERA5, and BTCV are used for evaluation. -Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data **Abstract:** Transformers have become the standard architecture for sequence modeling, but their self-attention layers scale quadratically with sequence length. Mamba, a state space model (SSM), achieves comparable performance to Transformers while scaling linearly. This work presents Mamba-ND, which extends Mamba to arbitrary multi-dimensional data. Mamba-ND alternately processes the input data across different dimensions in row-major orderings. Extensive experiments show that Mamba-ND outperforms Transformers on various multi-dimensional benchmarks, including ImageNet-1K, HMDB-51, UCF-101, ERA5, and BTCV, with significantly fewer parameters and subquadratic complexity. **Keywords:** State Space Models · Multi-Dimensional Modeling - **Introduction:** - **Background on SSMs:** SSMs model input data using ordinary differential equations (ODEs) and have shown superior performance on long sequences. - **Mamba Layer:** Mamba layers consist of a 1D convolution, an SSM kernel, and a residual connection. - **Methodology:** Various approaches to adapt Mamba to multidimensional data are explored, including layer-level and block-level designs. - **Scan Orderings:** - **Definition:** Scan orderings are permutations of the axes of the input data flattened into a 1D sequence. - **Examples:** For 2D data, there are four possible orderings: $(HW) +$, $(HW) -$, $(WH) +$, and $(WH) -$. - **3D Data:** There are 12 possible orderings, such as $(HWT) +$ and $(WHT) -$. - **Adapting the Mamba Layer:** - **Bi-SSM Layer:** Passes the output of the convolution layer to two independent SSM kernels. - **ND-SSM Layer:** Extends Bi-SSM by incorporating additional SSMs for different orderings. - **Multi-head SSM Layer:** Splits the input sequence into multiple heads, each processed by a separate SSM kernel. - **Arranging Mamba Layers:** - **Alternating-Directional:** Changes the direction of SSM in each layer alternately. - **Bi-Directional:** Processes the input in opposite directions in each layer. - **Quad-Directional:** Further groups the directions to improve performance. - **Scan Factorization:** - **Purpose:** To mitigate the quadratic complexity of Transformers, various ways to factorize the SSM scan into smaller scans are explored. - **Experiments:** - **Datasets and Setups:** ImageNet-1K, HMDB-51, UCF-101, ERA5, and BTCV are used for evaluation. -

Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data

13 Jul 2024 | Shufan Li, Harkanwar Singh, Aditya Grover