25 Jul 2024 | Yuhuan Yang*1, Chaofan Ma*1, Jiangchao Yao1, Zhun Zhong⊗2, Ya Zhang1, and Yanfeng Wang⊗1
ReMamber is a novel architecture designed for Referring Image Segmentation (RIS) that integrates the power of Mamba, a state space model, with a multi-modal Mamba Twister block. The Mamba Twister block explicitly models image-text interaction and fuses textual and visual features through a unique channel and spatial twisting mechanism. This approach addresses the quadratic computation cost issue in RIS tasks, which is common in transformer-based models. The paper demonstrates competitive results on three challenging benchmarks and provides thorough analyses of ReMamber, discussing other fusion designs using Mamba. The key contributions include the pioneering exploration of Mamba in RIS, the design of the ReMamber architecture, and the analysis of its performance and variants. The code for ReMamber is available at <https://github.com/yyh-rain-song/ReMamber>.ReMamber is a novel architecture designed for Referring Image Segmentation (RIS) that integrates the power of Mamba, a state space model, with a multi-modal Mamba Twister block. The Mamba Twister block explicitly models image-text interaction and fuses textual and visual features through a unique channel and spatial twisting mechanism. This approach addresses the quadratic computation cost issue in RIS tasks, which is common in transformer-based models. The paper demonstrates competitive results on three challenging benchmarks and provides thorough analyses of ReMamber, discussing other fusion designs using Mamba. The key contributions include the pioneering exploration of Mamba in RIS, the design of the ReMamber architecture, and the analysis of its performance and variants. The code for ReMamber is available at <https://github.com/yyh-rain-song/ReMamber>.