18 Jun 2024 | Guowen Zhang, Lue Fan, Chenhang He, Zhen Lei, Zhaoxiang Zhang, Lei Zhang
Voxel Mamba is a group-free state space model (SSM) designed for 3D object detection from point clouds. It addresses the issue of spatial proximity loss in serialization-based methods by serializing all voxels into a single sequence without grouping them into multiple sequences. The linear complexity of SSMs allows for efficient processing of the entire voxel space, preserving spatial relationships. To enhance spatial proximity, Voxel Mamba introduces the Dual-scale SSM Block (DSB) and Implicit Window Partition (IWP). The DSB uses a hierarchical structure with a forward and backward branch to process high and low-resolution voxel features, respectively, while the IWP encodes voxel positions to maintain spatial context without explicit window partitioning. Experiments on the Waymo Open and nuScenes datasets show that Voxel Mamba outperforms state-of-the-art methods in terms of accuracy and computational efficiency. The method's effectiveness is validated through ablation studies, demonstrating the importance of space-filling curves and the benefits of each component in the Voxel Mamba framework.Voxel Mamba is a group-free state space model (SSM) designed for 3D object detection from point clouds. It addresses the issue of spatial proximity loss in serialization-based methods by serializing all voxels into a single sequence without grouping them into multiple sequences. The linear complexity of SSMs allows for efficient processing of the entire voxel space, preserving spatial relationships. To enhance spatial proximity, Voxel Mamba introduces the Dual-scale SSM Block (DSB) and Implicit Window Partition (IWP). The DSB uses a hierarchical structure with a forward and backward branch to process high and low-resolution voxel features, respectively, while the IWP encodes voxel positions to maintain spatial context without explicit window partitioning. Experiments on the Waymo Open and nuScenes datasets show that Voxel Mamba outperforms state-of-the-art methods in terms of accuracy and computational efficiency. The method's effectiveness is validated through ablation studies, demonstrating the importance of space-filling curves and the benefits of each component in the Voxel Mamba framework.