8 May 2024 | Yi Xiao, Qiangqiang Yuan, Kui Jiang, Yuzeng Chen, Qiang Zhang, Chia-Wen Lin
This paper proposes a Frequency-Assisted Mamba (FMSR) framework for remote sensing image (RSI) super-resolution (SR). FMSR integrates the Vision State Space Model (VSSM), which is efficient for large-scale RSI due to its linear complexity, with frequency analysis to enhance spatial-frequency fusion. The framework features a multi-level fusion architecture with a Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM) to effectively capture global and local dependencies. FSM adaptively selects informative frequency cues, while HGM enhances spatially-varying representation. The framework also includes learnable scaling adaptors to recalibrate multi-level features for accurate fusion. Experiments on AID, DOTA, and DIOR benchmarks show that FMSR outperforms state-of-the-art Transformer-based methods like HAT-L in terms of PSNR by 0.11 dB on average, while consuming only 28.05% and 19.08% of its memory and computational resources, respectively. FMSR achieves superior performance in both quantitative and qualitative evaluations, demonstrating its effectiveness in capturing and reconstructing fine details in RSI. The method's efficiency and accuracy make it a promising solution for large-scale RSI SR tasks.This paper proposes a Frequency-Assisted Mamba (FMSR) framework for remote sensing image (RSI) super-resolution (SR). FMSR integrates the Vision State Space Model (VSSM), which is efficient for large-scale RSI due to its linear complexity, with frequency analysis to enhance spatial-frequency fusion. The framework features a multi-level fusion architecture with a Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM) to effectively capture global and local dependencies. FSM adaptively selects informative frequency cues, while HGM enhances spatially-varying representation. The framework also includes learnable scaling adaptors to recalibrate multi-level features for accurate fusion. Experiments on AID, DOTA, and DIOR benchmarks show that FMSR outperforms state-of-the-art Transformer-based methods like HAT-L in terms of PSNR by 0.11 dB on average, while consuming only 28.05% and 19.08% of its memory and computational resources, respectively. FMSR achieves superior performance in both quantitative and qualitative evaluations, demonstrating its effectiveness in capturing and reconstructing fine details in RSI. The method's efficiency and accuracy make it a promising solution for large-scale RSI SR tasks.