25 Apr 2017 | Tomas Simon Hanbyul Joo Iain Matthews Yaser Sheikh
This paper presents a method called multiview bootstrapping to train fine-grained detectors for hand keypoints, which are prone to occlusion. The approach involves using a multi-camera system to generate noisy labels in multiple views of the hand, which are then triangulated in 3D using multiview geometry. Outliers are marked and the reprojected triangulations are used as new labeled training data to improve the detector. This process is iterated to generate more labeled data, enhancing the detector's performance. The method is applied to train a hand keypoint detector for single images, achieving real-time detection on RGB images with accuracy comparable to depth-sensor-based methods. Additionally, the detector enables 3D markerless hand motion capture with complex object interactions. The paper also derives analytical results relating the minimum number of views to achieve target true and false positive rates for a given detector.This paper presents a method called multiview bootstrapping to train fine-grained detectors for hand keypoints, which are prone to occlusion. The approach involves using a multi-camera system to generate noisy labels in multiple views of the hand, which are then triangulated in 3D using multiview geometry. Outliers are marked and the reprojected triangulations are used as new labeled training data to improve the detector. This process is iterated to generate more labeled data, enhancing the detector's performance. The method is applied to train a hand keypoint detector for single images, achieving real-time detection on RGB images with accuracy comparable to depth-sensor-based methods. Additionally, the detector enables 3D markerless hand motion capture with complex object interactions. The paper also derives analytical results relating the minimum number of views to achieve target true and false positive rates for a given detector.