Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral Image Classification

Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral Image Classification

July 16, 2024 | Weilian Zhou, Sei-ichiro Kamata, Haipeng Wang, Man Sing Wong, Huiying (Cynthia) Hou
This paper introduces the Mamba-in-Mamba (MiM) architecture for hyperspectral image (HSI) classification, which addresses the limitations of existing models like RNNs and Transformers. The MiM model incorporates a centralized Mamba-Cross-Scan (MCS) mechanism to transform HSI patches into efficient-pair sequences, a Tokenized Mamba (T-Mamba) encoder with Gaussian Decay Mask (GDM), Semantic Token Learner (STL), and Semantic Token Fuser (STF) for enhanced feature generation, and a Weighted MCS Fusion (WMF) module with a Multi-Scale Loss Design to improve model training efficiency. The T-Mamba encoder processes sequences in a bi-directional manner, focusing on the central pixel for patch-wise classification. The MCS mechanism ensures continuous and multi-directional scanning of the image, capturing spatiotemporal correlations effectively. The MiM model achieves competitive and state-of-the-art performance on four HSI datasets, demonstrating its feasibility and efficiency in HSI classification tasks. The study highlights the potential of Mamba-based models for HSI classification, offering a lightweight and efficient alternative to traditional methods.This paper introduces the Mamba-in-Mamba (MiM) architecture for hyperspectral image (HSI) classification, which addresses the limitations of existing models like RNNs and Transformers. The MiM model incorporates a centralized Mamba-Cross-Scan (MCS) mechanism to transform HSI patches into efficient-pair sequences, a Tokenized Mamba (T-Mamba) encoder with Gaussian Decay Mask (GDM), Semantic Token Learner (STL), and Semantic Token Fuser (STF) for enhanced feature generation, and a Weighted MCS Fusion (WMF) module with a Multi-Scale Loss Design to improve model training efficiency. The T-Mamba encoder processes sequences in a bi-directional manner, focusing on the central pixel for patch-wise classification. The MCS mechanism ensures continuous and multi-directional scanning of the image, capturing spatiotemporal correlations effectively. The MiM model achieves competitive and state-of-the-art performance on four HSI datasets, demonstrating its feasibility and efficiency in HSI classification tasks. The study highlights the potential of Mamba-based models for HSI classification, offering a lightweight and efficient alternative to traditional methods.
Reach us at info@study.space