PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation

PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation

31 Dec 2018 | Sida Peng*, Yuan Liu*, Qixing Huang, Hujun Bao, Xiaowei Zhou
This paper addresses the challenge of 6DoF pose estimation from a single RGB image under severe occlusion or truncation. Traditional methods often rely on hand-crafted features, which are sensitive to image variations and background clutter. Deep learning-based methods, while showing significant robustness to environment variations, still struggle with generalization and handling occluded and truncated objects. To tackle these issues, the authors propose a Pixel-wise Voting Network (PVNet) that predicts pixel-wise unit vectors pointing to keypoints, which are then used for RANSAC-based voting to localize keypoint locations. This approach creates a flexible representation for localizing occluded or truncated keypoints and provides uncertainties that can be leveraged by the PnP solver. Experiments on benchmark datasets (LINEMOD, Occlusion LINEMOD, and YCB-Video) show that the proposed method outperforms state-of-the-art methods by a large margin, while being efficient for real-time pose estimation. The authors also create a new dataset, Truncated LINEMOD, to validate the robustness of their approach against truncation.This paper addresses the challenge of 6DoF pose estimation from a single RGB image under severe occlusion or truncation. Traditional methods often rely on hand-crafted features, which are sensitive to image variations and background clutter. Deep learning-based methods, while showing significant robustness to environment variations, still struggle with generalization and handling occluded and truncated objects. To tackle these issues, the authors propose a Pixel-wise Voting Network (PVNet) that predicts pixel-wise unit vectors pointing to keypoints, which are then used for RANSAC-based voting to localize keypoint locations. This approach creates a flexible representation for localizing occluded or truncated keypoints and provides uncertainties that can be leveraged by the PnP solver. Experiments on benchmark datasets (LINEMOD, Occlusion LINEMOD, and YCB-Video) show that the proposed method outperforms state-of-the-art methods by a large margin, while being efficient for real-time pose estimation. The authors also create a new dataset, Truncated LINEMOD, to validate the robustness of their approach against truncation.
Reach us at info@study.space
[slides] PVNet%3A Pixel-Wise Voting Network for 6DoF Pose Estimation | StudySpace