Understanding LION%3A Linear Group RNN for 3D Object Detection in Point Clouds

**LION: Linear Group RNN for 3D Object Detection in Point Clouds** The paper introduces LION, a window-based framework that leverages Linear Group RNN (LGRN) for accurate 3D object detection in point clouds. LION addresses the limitations of transformers in handling long-range relationships and computational costs by using LGRN, which has low computational complexity. The key contributions of LION include: 1. **Window-Based Framework**: LION uses a window-based approach to group features for long-range interaction, allowing for more interaction in larger groups compared to transformer-based methods. 2. **3D Spatial Feature Descriptor**: To enhance spatial information, LION incorporates a 3D spatial feature descriptor that captures local 3D spatial relationships, compensating for the limitations of linear RNNs in spatial modeling. 3. **Voxel Generation Strategy**: A new strategy for generating voxels is proposed to densify foreground features in highly sparse point clouds, improving feature representation. **Methods and Implementation**: - **3D Sparse Window Partition**: LION converts point clouds into voxels and partitions them into non-overlapping windows for feature interaction. - **LION Block**: The core component of LION, consisting of LION layers for long-range interaction, 3D spatial feature descriptors, voxel merging, and voxel expanding. - **Voxel Generation**: A method to generate new voxel features using the auto-regressive property of LGRN, enhancing foreground features. **Experiments**: - **Datasets**: Waymo Open Dataset, nuScenes, Argoverse V2, and ONCE. - **Evaluation Metrics**: mAP, mAPH, NDS. - **Results**: LION achieves state-of-the-art performance on multiple datasets, outperforming transformer-based methods and other linear RNN operators (Mamba, RWKV, RetNet). **Ablation Study**: - **Large Group Size**: Enhances performance by allowing long-range interaction. - **3D Spatial Feature Descriptor**: Improves performance by capturing local spatial information. - **Voxel Generation**: Enhances feature representation in sparse point clouds. **Conclusion**: LION demonstrates the effectiveness of LGRN in 3D object detection, achieving superior performance on challenging datasets. Future work could focus on optimizing running speed while maintaining high detection accuracy.**LION: Linear Group RNN for 3D Object Detection in Point Clouds** The paper introduces LION, a window-based framework that leverages Linear Group RNN (LGRN) for accurate 3D object detection in point clouds. LION addresses the limitations of transformers in handling long-range relationships and computational costs by using LGRN, which has low computational complexity. The key contributions of LION include: 1. **Window-Based Framework**: LION uses a window-based approach to group features for long-range interaction, allowing for more interaction in larger groups compared to transformer-based methods. 2. **3D Spatial Feature Descriptor**: To enhance spatial information, LION incorporates a 3D spatial feature descriptor that captures local 3D spatial relationships, compensating for the limitations of linear RNNs in spatial modeling. 3. **Voxel Generation Strategy**: A new strategy for generating voxels is proposed to densify foreground features in highly sparse point clouds, improving feature representation. **Methods and Implementation**: - **3D Sparse Window Partition**: LION converts point clouds into voxels and partitions them into non-overlapping windows for feature interaction. - **LION Block**: The core component of LION, consisting of LION layers for long-range interaction, 3D spatial feature descriptors, voxel merging, and voxel expanding. - **Voxel Generation**: A method to generate new voxel features using the auto-regressive property of LGRN, enhancing foreground features. **Experiments**: - **Datasets**: Waymo Open Dataset, nuScenes, Argoverse V2, and ONCE. - **Evaluation Metrics**: mAP, mAPH, NDS. - **Results**: LION achieves state-of-the-art performance on multiple datasets, outperforming transformer-based methods and other linear RNN operators (Mamba, RWKV, RetNet). **Ablation Study**: - **Large Group Size**: Enhances performance by allowing long-range interaction. - **3D Spatial Feature Descriptor**: Improves performance by capturing local spatial information. - **Voxel Generation**: Enhances feature representation in sparse point clouds. **Conclusion**: LION demonstrates the effectiveness of LGRN in 3D object detection, achieving superior performance on challenging datasets. Future work could focus on optimizing running speed while maintaining high detection accuracy.

LION: Linear Group RNN for 3D Object Detection in Point Clouds

25 Jul 2024 | Zhe Liu, Jinghua Hou, Xinyu Wang, Xiaoqing Ye, Jingdong Wang, Hengshuang Zhao, Xiang Bai