25 Jul 2024 | Zhe Liu, Jinghua Hou, Xinyu Wang, Xiaoqing Ye, Jingdong Wang, Hengshuang Zhao, Xiang Bai
**LION: Linear Group RNN for 3D Object Detection in Point Clouds**
The paper introduces LION, a window-based framework that leverages Linear Group RNN (LGRN) for accurate 3D object detection in point clouds. LION addresses the limitations of transformers in handling long-range relationships and computational costs by using LGRN, which has low computational complexity. The key contributions of LION include:
1. **Window-Based Framework**: LION uses a window-based approach to group features for long-range interaction, allowing for more interaction in larger groups compared to transformer-based methods.
2. **3D Spatial Feature Descriptor**: To enhance spatial information, LION incorporates a 3D spatial feature descriptor that captures local 3D spatial relationships, compensating for the limitations of linear RNNs in spatial modeling.
3. **Voxel Generation Strategy**: A new strategy for generating voxels is proposed to densify foreground features in highly sparse point clouds, improving feature representation.
**Methods and Implementation**:
- **3D Sparse Window Partition**: LION converts point clouds into voxels and partitions them into non-overlapping windows for feature interaction.
- **LION Block**: The core component of LION, consisting of LION layers for long-range interaction, 3D spatial feature descriptors, voxel merging, and voxel expanding.
- **Voxel Generation**: A method to generate new voxel features using the auto-regressive property of LGRN, enhancing foreground features.
**Experiments**:
- **Datasets**: Waymo Open Dataset, nuScenes, Argoverse V2, and ONCE.
- **Evaluation Metrics**: mAP, mAPH, NDS.
- **Results**: LION achieves state-of-the-art performance on multiple datasets, outperforming transformer-based methods and other linear RNN operators (Mamba, RWKV, RetNet).
**Ablation Study**:
- **Large Group Size**: Enhances performance by allowing long-range interaction.
- **3D Spatial Feature Descriptor**: Improves performance by capturing local spatial information.
- **Voxel Generation**: Enhances feature representation in sparse point clouds.
**Conclusion**:
LION demonstrates the effectiveness of LGRN in 3D object detection, achieving superior performance on challenging datasets. Future work could focus on optimizing running speed while maintaining high detection accuracy.**LION: Linear Group RNN for 3D Object Detection in Point Clouds**
The paper introduces LION, a window-based framework that leverages Linear Group RNN (LGRN) for accurate 3D object detection in point clouds. LION addresses the limitations of transformers in handling long-range relationships and computational costs by using LGRN, which has low computational complexity. The key contributions of LION include:
1. **Window-Based Framework**: LION uses a window-based approach to group features for long-range interaction, allowing for more interaction in larger groups compared to transformer-based methods.
2. **3D Spatial Feature Descriptor**: To enhance spatial information, LION incorporates a 3D spatial feature descriptor that captures local 3D spatial relationships, compensating for the limitations of linear RNNs in spatial modeling.
3. **Voxel Generation Strategy**: A new strategy for generating voxels is proposed to densify foreground features in highly sparse point clouds, improving feature representation.
**Methods and Implementation**:
- **3D Sparse Window Partition**: LION converts point clouds into voxels and partitions them into non-overlapping windows for feature interaction.
- **LION Block**: The core component of LION, consisting of LION layers for long-range interaction, 3D spatial feature descriptors, voxel merging, and voxel expanding.
- **Voxel Generation**: A method to generate new voxel features using the auto-regressive property of LGRN, enhancing foreground features.
**Experiments**:
- **Datasets**: Waymo Open Dataset, nuScenes, Argoverse V2, and ONCE.
- **Evaluation Metrics**: mAP, mAPH, NDS.
- **Results**: LION achieves state-of-the-art performance on multiple datasets, outperforming transformer-based methods and other linear RNN operators (Mamba, RWKV, RetNet).
**Ablation Study**:
- **Large Group Size**: Enhances performance by allowing long-range interaction.
- **3D Spatial Feature Descriptor**: Improves performance by capturing local spatial information.
- **Voxel Generation**: Enhances feature representation in sparse point clouds.
**Conclusion**:
LION demonstrates the effectiveness of LGRN in 3D object detection, achieving superior performance on challenging datasets. Future work could focus on optimizing running speed while maintaining high detection accuracy.