18 Mar 2024 | Yuan Shi, Bin Xia, Xiaoyu Jin, Xing Wang, Tianyu Zhao, Xin Xia, Xuefeng Xiao, Wenming Yang
VmambaIR is a novel image restoration model that leverages State Space Models (SSMs) to address the limitations of existing models such as CNNs, GANs, transformers, and diffusion models. The proposed model, VmambaIR, incorporates a Unet architecture with Omni Selective Scan (OSS) blocks, which consist of an OSS module and an Efficient Feed-Forward Network (EFFN). The OSS module enables comprehensive and efficient modeling of image information flow from all six directions, overcoming the unidirectional modeling limitation of SSMs. The model is evaluated on multiple image restoration tasks, including image deraining, single image super-resolution, and real-world image super-resolution. Extensive experimental results demonstrate that VmambaIR achieves state-of-the-art performance with significantly fewer computational resources and parameters. The research highlights the potential of SSMs as promising alternatives to transformer and CNN architectures in low-level visual tasks.VmambaIR is a novel image restoration model that leverages State Space Models (SSMs) to address the limitations of existing models such as CNNs, GANs, transformers, and diffusion models. The proposed model, VmambaIR, incorporates a Unet architecture with Omni Selective Scan (OSS) blocks, which consist of an OSS module and an Efficient Feed-Forward Network (EFFN). The OSS module enables comprehensive and efficient modeling of image information flow from all six directions, overcoming the unidirectional modeling limitation of SSMs. The model is evaluated on multiple image restoration tasks, including image deraining, single image super-resolution, and real-world image super-resolution. Extensive experimental results demonstrate that VmambaIR achieves state-of-the-art performance with significantly fewer computational resources and parameters. The research highlights the potential of SSMs as promising alternatives to transformer and CNN architectures in low-level visual tasks.