LION: Linear Group RNN for 3D Object Detection in Point Clouds

LION: Linear Group RNN for 3D Object Detection in Point Clouds

25 Jul 2024 | Zhe Liu, Jinghua Hou, Xinyu Wang, Xiaoqing Ye, Jingdong Wang, Hengshuang Zhao, Xiang Bai
**LION: Linear Group RNN for 3D Object Detection in Point Clouds** The paper introduces LION, a window-based framework that leverages Linear Group RNN (LGRN) for accurate 3D object detection in point clouds. LION addresses the limitations of transformers in handling long-range relationships and computational costs by using LGRN, which has low computational complexity. The key contributions of LION include: 1. **Window-Based Framework**: LION uses a window-based approach to group features for long-range interaction, allowing for more interaction in larger groups compared to transformer-based methods. 2. **3D Spatial Feature Descriptor**: To enhance spatial information, LION incorporates a 3D spatial feature descriptor that captures local 3D spatial relationships, compensating for the limitations of linear RNNs in spatial modeling. 3. **Voxel Generation Strategy**: A new strategy for generating voxels is proposed to densify foreground features in highly sparse point clouds, improving feature representation. **Methods and Implementation**: - **3D Sparse Window Partition**: LION converts point clouds into voxels and partitions them into non-overlapping windows for feature interaction. - **LION Block**: The core component of LION, consisting of LION layers for long-range interaction, 3D spatial feature descriptors, voxel merging, and voxel expanding. - **Voxel Generation**: A method to generate new voxel features using the auto-regressive property of LGRN, enhancing foreground features. **Experiments**: - **Datasets**: Waymo Open Dataset, nuScenes, Argoverse V2, and ONCE. - **Evaluation Metrics**: mAP, mAPH, NDS. - **Results**: LION achieves state-of-the-art performance on multiple datasets, outperforming transformer-based methods and other linear RNN operators (Mamba, RWKV, RetNet). **Ablation Study**: - **Large Group Size**: Enhances performance by allowing long-range interaction. - **3D Spatial Feature Descriptor**: Improves performance by capturing local spatial information. - **Voxel Generation**: Enhances feature representation in sparse point clouds. **Conclusion**: LION demonstrates the effectiveness of LGRN in 3D object detection, achieving superior performance on challenging datasets. Future work could focus on optimizing running speed while maintaining high detection accuracy.**LION: Linear Group RNN for 3D Object Detection in Point Clouds** The paper introduces LION, a window-based framework that leverages Linear Group RNN (LGRN) for accurate 3D object detection in point clouds. LION addresses the limitations of transformers in handling long-range relationships and computational costs by using LGRN, which has low computational complexity. The key contributions of LION include: 1. **Window-Based Framework**: LION uses a window-based approach to group features for long-range interaction, allowing for more interaction in larger groups compared to transformer-based methods. 2. **3D Spatial Feature Descriptor**: To enhance spatial information, LION incorporates a 3D spatial feature descriptor that captures local 3D spatial relationships, compensating for the limitations of linear RNNs in spatial modeling. 3. **Voxel Generation Strategy**: A new strategy for generating voxels is proposed to densify foreground features in highly sparse point clouds, improving feature representation. **Methods and Implementation**: - **3D Sparse Window Partition**: LION converts point clouds into voxels and partitions them into non-overlapping windows for feature interaction. - **LION Block**: The core component of LION, consisting of LION layers for long-range interaction, 3D spatial feature descriptors, voxel merging, and voxel expanding. - **Voxel Generation**: A method to generate new voxel features using the auto-regressive property of LGRN, enhancing foreground features. **Experiments**: - **Datasets**: Waymo Open Dataset, nuScenes, Argoverse V2, and ONCE. - **Evaluation Metrics**: mAP, mAPH, NDS. - **Results**: LION achieves state-of-the-art performance on multiple datasets, outperforming transformer-based methods and other linear RNN operators (Mamba, RWKV, RetNet). **Ablation Study**: - **Large Group Size**: Enhances performance by allowing long-range interaction. - **3D Spatial Feature Descriptor**: Improves performance by capturing local spatial information. - **Voxel Generation**: Enhances feature representation in sparse point clouds. **Conclusion**: LION demonstrates the effectiveness of LGRN in 3D object detection, achieving superior performance on challenging datasets. Future work could focus on optimizing running speed while maintaining high detection accuracy.
Reach us at info@study.space