22 Aug 2019 | Charles R. Qi1 Or Litany1 Kaiming He1 Leonidas J. Guibas1,2
VoteNet is a deep learning-based 3D object detection method that directly processes point clouds without relying on 2D detectors. It uses a synergy of deep point set networks and Hough voting to predict 3D bounding boxes and semantic classes from point cloud data. The method is designed to be as generic as possible and achieves state-of-the-art performance on two large datasets: SUN RGB-D and ScanNet. VoteNet outperforms previous methods by using purely geometric information without relying on color images. The key idea is to generate votes from seed points in the point cloud, which are then clustered and aggregated to form object proposals. This approach allows for more effective context aggregation, especially when object centers are far from the object surface. The network is end-to-end trainable and efficient, with a compact model size and high performance. It is evaluated on two challenging 3D object detection datasets, where it significantly outperforms prior arts that use both RGB and geometry or multi-view RGB images. The method is robust to sparse data and effective in cluttered environments. The network architecture includes a backbone network for feature learning, a voting module for generating votes, and a proposal module for aggregating votes into object proposals. The model is trained end-to-end and achieves high accuracy in 3D object detection. The results show that VoteNet is a powerful and efficient method for 3D object detection in point clouds.VoteNet is a deep learning-based 3D object detection method that directly processes point clouds without relying on 2D detectors. It uses a synergy of deep point set networks and Hough voting to predict 3D bounding boxes and semantic classes from point cloud data. The method is designed to be as generic as possible and achieves state-of-the-art performance on two large datasets: SUN RGB-D and ScanNet. VoteNet outperforms previous methods by using purely geometric information without relying on color images. The key idea is to generate votes from seed points in the point cloud, which are then clustered and aggregated to form object proposals. This approach allows for more effective context aggregation, especially when object centers are far from the object surface. The network is end-to-end trainable and efficient, with a compact model size and high performance. It is evaluated on two challenging 3D object detection datasets, where it significantly outperforms prior arts that use both RGB and geometry or multi-view RGB images. The method is robust to sparse data and effective in cluttered environments. The network architecture includes a backbone network for feature learning, a voting module for generating votes, and a proposal module for aggregating votes into object proposals. The model is trained end-to-end and achieves high accuracy in 3D object detection. The results show that VoteNet is a powerful and efficient method for 3D object detection in point clouds.