PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation

PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation

31 Dec 2018 | Sida Peng*, Yuan Liu*, Qixing Huang, Hujun Bao, Xiaowei Zhou
This paper introduces PVNet, a Pixel-wise Voting Network for 6DoF pose estimation from a single RGB image under severe occlusion or truncation. Traditional methods rely on hand-crafted features, while deep learning methods train end-to-end neural networks. However, these methods struggle with occluded and truncated objects. PVNet addresses this by predicting unit vectors pointing to keypoints for each pixel and using RANSAC-based voting to localize keypoint locations. This creates a flexible representation for localizing occluded or truncated keypoints and provides uncertainties of keypoint locations that can be leveraged by the PnP solver. Experiments show that PVNet outperforms the state of the art on the LINEMOD, Occlusion LINEMOD and YCBVideo datasets, while being efficient for real-time pose estimation. The authors also create a Truncation LINEMOD dataset to validate the robustness of their approach against truncation. PVNet uses a two-stage pipeline: first detecting 2D object keypoints using CNNs and then computing 6D pose parameters using the PnP algorithm. The key innovation is the use of a Pixel-wise Voting Network to detect 2D keypoints in a RANSAC-like fashion, which robustly handles occluded and truncated objects. The RANSAC-based voting also gives a spatial probability distribution of each keypoint, allowing for an uncertainty-driven PnP algorithm. The method is evaluated on multiple benchmark datasets and shows state-of-the-art performance. The approach is efficient, running at 25 fps on a GTX 1080ti GPU, and is robust to occlusion and truncation. The results demonstrate that PVNet achieves superior performance in pose estimation compared to existing methods.This paper introduces PVNet, a Pixel-wise Voting Network for 6DoF pose estimation from a single RGB image under severe occlusion or truncation. Traditional methods rely on hand-crafted features, while deep learning methods train end-to-end neural networks. However, these methods struggle with occluded and truncated objects. PVNet addresses this by predicting unit vectors pointing to keypoints for each pixel and using RANSAC-based voting to localize keypoint locations. This creates a flexible representation for localizing occluded or truncated keypoints and provides uncertainties of keypoint locations that can be leveraged by the PnP solver. Experiments show that PVNet outperforms the state of the art on the LINEMOD, Occlusion LINEMOD and YCBVideo datasets, while being efficient for real-time pose estimation. The authors also create a Truncation LINEMOD dataset to validate the robustness of their approach against truncation. PVNet uses a two-stage pipeline: first detecting 2D object keypoints using CNNs and then computing 6D pose parameters using the PnP algorithm. The key innovation is the use of a Pixel-wise Voting Network to detect 2D keypoints in a RANSAC-like fashion, which robustly handles occluded and truncated objects. The RANSAC-based voting also gives a spatial probability distribution of each keypoint, allowing for an uncertainty-driven PnP algorithm. The method is evaluated on multiple benchmark datasets and shows state-of-the-art performance. The approach is efficient, running at 25 fps on a GTX 1080ti GPU, and is robust to occlusion and truncation. The results demonstrate that PVNet achieves superior performance in pose estimation compared to existing methods.
Reach us at info@study.space