22 Aug 2019 | Charles R. Qi1 Or Litany1 Kaiming He1 Leonidas J. Guibas1,2
This paper introduces VoteNet, an end-to-end 3D object detection network that leverages deep point set networks and Hough voting to directly process raw point cloud data. Unlike traditional methods that rely on 2D detectors or convert point clouds to regular grids, VoteNet aims to be a generic and efficient solution for 3D object detection. The key challenge in this approach is the sparsity of point clouds, which makes it difficult to accurately regress object centers. To address this, VoteNet uses a voting mechanism inspired by the Hough transform, where seed points generate votes that are aggregated to form box proposals. This approach effectively aggregates context around object centers, even if they are far from any surface point. The model is evaluated on two large datasets, ScanNet and SUN RGB-D, and achieves state-of-the-art performance using only geometric information, outperforming previous methods that use both RGB and geometric cues. The paper also includes a detailed analysis of the importance of voting and the effects of different vote aggregation approaches, demonstrating the robustness and efficiency of VoteNet.This paper introduces VoteNet, an end-to-end 3D object detection network that leverages deep point set networks and Hough voting to directly process raw point cloud data. Unlike traditional methods that rely on 2D detectors or convert point clouds to regular grids, VoteNet aims to be a generic and efficient solution for 3D object detection. The key challenge in this approach is the sparsity of point clouds, which makes it difficult to accurately regress object centers. To address this, VoteNet uses a voting mechanism inspired by the Hough transform, where seed points generate votes that are aggregated to form box proposals. This approach effectively aggregates context around object centers, even if they are far from any surface point. The model is evaluated on two large datasets, ScanNet and SUN RGB-D, and achieves state-of-the-art performance using only geometric information, outperforming previous methods that use both RGB and geometric cues. The paper also includes a detailed analysis of the importance of voting and the effects of different vote aggregation approaches, demonstrating the robustness and efficiency of VoteNet.