28 Apr 2024 | Ju Huang, Shiao Wang, Shuai Wang, Zhe Wu, Xiao Wang (✉), and Bo Jiang
The paper "Mamba-FETrack: Frame-Event Tracking via State Space Model" introduces a novel framework for RGB-Event tracking, named Mamba-FETrack, which leverages the State Space Model (SSM) to achieve high-performance tracking while reducing computational costs and memory usage. The authors address the limitations of existing Transformer-based trackers, which are computationally expensive and require significant GPU memory. By using two modality-specific Mamba backbone networks, the framework extracts features from RGB frames and Event streams, enhancing the interaction between these modalities through a FusionMamba block. The fused features are then fed into a tracking head for target localization. Extensive experiments on the FELT and FE108 datasets demonstrate that Mamba-FETrack achieves comparable or better performance compared to ViT-S-based trackers while significantly reducing FLOPs and parameters. The method also shows improved efficiency in training and inference, making it a promising approach for RGB-Event tracking.The paper "Mamba-FETrack: Frame-Event Tracking via State Space Model" introduces a novel framework for RGB-Event tracking, named Mamba-FETrack, which leverages the State Space Model (SSM) to achieve high-performance tracking while reducing computational costs and memory usage. The authors address the limitations of existing Transformer-based trackers, which are computationally expensive and require significant GPU memory. By using two modality-specific Mamba backbone networks, the framework extracts features from RGB frames and Event streams, enhancing the interaction between these modalities through a FusionMamba block. The fused features are then fed into a tracking head for target localization. Extensive experiments on the FELT and FE108 datasets demonstrate that Mamba-FETrack achieves comparable or better performance compared to ViT-S-based trackers while significantly reducing FLOPs and parameters. The method also shows improved efficiency in training and inference, making it a promising approach for RGB-Event tracking.