1 Feb 2018 | Riza Alp Güler*, Natalia Neverova, Iasonas Kokkinos
This paper introduces DensePose-COCO, a large-scale dataset with manually annotated image-to-surface correspondences for dense human pose estimation. The dataset is built by gathering dense correspondences between 50K COCO images and the SMPL model, which represents a detailed, accurate parametric surface of the human body. The authors propose a new method called DensePose-RCNN, which uses a region-based approach to densely regress part-specific UV coordinates within every human region at multiple frames per second. The method is trained using a combination of fully convolutional networks and region-based models, with the latter showing superior performance. The authors also introduce a novel annotation pipeline that allows them to gather ground-truth correspondences for 50K images, yielding their new DensePose-COCO dataset. They evaluate their method on a test set of 1.5k images containing 2.3k humans, using a training set of 48K humans. The results show that their method achieves high accuracy in dense human pose estimation, even in the presence of occlusions, scale variations, and complex backgrounds. The authors also introduce a 'teacher' network that can 'inpaint' the supervision signal in the rest of the image domain, leading to improved performance. The method is able to handle large amounts of occlusion, scale, and pose variation, while also successfully hallucinating the human body behind clothes such as dresses or skirts. The paper concludes that dense human pose estimation is feasible and has significant potential for applications in augmented reality, graphics, and human-computer interaction. The code and data are publicly available on the project's webpage.This paper introduces DensePose-COCO, a large-scale dataset with manually annotated image-to-surface correspondences for dense human pose estimation. The dataset is built by gathering dense correspondences between 50K COCO images and the SMPL model, which represents a detailed, accurate parametric surface of the human body. The authors propose a new method called DensePose-RCNN, which uses a region-based approach to densely regress part-specific UV coordinates within every human region at multiple frames per second. The method is trained using a combination of fully convolutional networks and region-based models, with the latter showing superior performance. The authors also introduce a novel annotation pipeline that allows them to gather ground-truth correspondences for 50K images, yielding their new DensePose-COCO dataset. They evaluate their method on a test set of 1.5k images containing 2.3k humans, using a training set of 48K humans. The results show that their method achieves high accuracy in dense human pose estimation, even in the presence of occlusions, scale variations, and complex backgrounds. The authors also introduce a 'teacher' network that can 'inpaint' the supervision signal in the rest of the image domain, leading to improved performance. The method is able to handle large amounts of occlusion, scale, and pose variation, while also successfully hallucinating the human body behind clothes such as dresses or skirts. The paper concludes that dense human pose estimation is feasible and has significant potential for applications in augmented reality, graphics, and human-computer interaction. The code and data are publicly available on the project's webpage.