28 May 2024 | Shiyu Qin, Jinpeng Wang, Yimin Zhou, Bin Chen, Tianci Luo, Baoyi An, Tao Dai, Shutao Xia, Yaowei Wang
MambaVC is a novel visual compression network based on selective state spaces, designed to achieve superior rate-distortion performance with lower computational and memory overheads. The model introduces a Visual State Space (VSS) block with a 2D selective scanning (2DSS) module, which enhances global context modeling and improves compression efficiency. MambaVC outperforms CNN and Transformer variants on benchmark datasets, achieving significant improvements in compression efficiency and performance, especially for high-resolution images. It achieves a 9.3% and 15.6% improvement over CNN and Transformer variants on the Kodak dataset, respectively, while reducing computation by 42% and 24%, and saving 12% and 71% of memory. MambaVC also demonstrates strong performance on video compression, with a MambaVC-SSF variant showing competitive results. The model's efficiency and scalability make it suitable for real-world applications such as high-definition medical imaging and satellite imagery. The paper provides extensive experiments and comparisons, highlighting MambaVC's advantages in terms of redundancy elimination, effective receptive field, and lower quantization loss. The results show that MambaVC achieves better performance than traditional coding methods and other compression models, demonstrating its potential as a new direction in visual compression.MambaVC is a novel visual compression network based on selective state spaces, designed to achieve superior rate-distortion performance with lower computational and memory overheads. The model introduces a Visual State Space (VSS) block with a 2D selective scanning (2DSS) module, which enhances global context modeling and improves compression efficiency. MambaVC outperforms CNN and Transformer variants on benchmark datasets, achieving significant improvements in compression efficiency and performance, especially for high-resolution images. It achieves a 9.3% and 15.6% improvement over CNN and Transformer variants on the Kodak dataset, respectively, while reducing computation by 42% and 24%, and saving 12% and 71% of memory. MambaVC also demonstrates strong performance on video compression, with a MambaVC-SSF variant showing competitive results. The model's efficiency and scalability make it suitable for real-world applications such as high-definition medical imaging and satellite imagery. The paper provides extensive experiments and comparisons, highlighting MambaVC's advantages in terms of redundancy elimination, effective receptive field, and lower quantization loss. The results show that MambaVC achieves better performance than traditional coding methods and other compression models, demonstrating its potential as a new direction in visual compression.