End-to-end Recovery of Human Shape and Pose

End-to-end Recovery of Human Shape and Pose

18 Dec 2017 | Angjoo Kanazawa, Michael J. Black, David W. Jacobs, Jitendra Malik
This paper presents Human Mesh Recovery (HMR), an end-to-end framework for reconstructing a full 3D mesh of a human body from a single RGB image. Unlike most current methods that focus on 2D or 3D joint locations, HMR produces a richer and more useful mesh representation parameterized by shape and 3D joint angles. The main objective is to minimize the reprojection loss of keypoints, allowing the model to be trained using in-the-wild images with only 2D annotations. However, reprojection loss alone is highly under-constrained, so the authors introduce an adversary trained to distinguish real human bodies from synthetic ones. HMR can be trained with or without paired 2D-to-3D supervision and does not rely on intermediate 2D keypoint detection, directly inferring 3D pose and shape from image pixels. The model runs in real-time given a bounding box containing the person and outperforms previous optimization-based methods on 3D mesh reconstruction and is competitive on tasks like 3D joint location estimation and part segmentation. The framework uses the SMPL model, which parameterizes the mesh by 3D joint angles and a low-dimensional linear shape space. The model is trained end-to-end with an adversarial loss to ensure the output lies on the human body manifold. The 3D regression module iteratively infers the 3D human body and camera parameters that minimize the joint reprojection error. The inferred parameters are also sent to a discriminator network to determine if the 3D parameters correspond to real human bodies. This encourages the network to output parameters on the human manifold and acts as weak supervision. The model is evaluated on various in-the-wild images and outperforms previous methods in terms of 3D joint error and runtime. It also shows competitive results on tasks such as 3D joint location estimation and part segmentation. The model is trained on a large-scale dataset of 2D keypoint annotations and a separate dataset of 3D meshes of people with various poses and shapes. The adversarial prior helps regularize the model and ensures that the output is anthropometrically plausible. The model is trained in an end-to-end manner and can be used for real-time applications. The results show that the model can produce reasonable 3D reconstructions even without any paired 2D-to-3D supervision, opening up possibilities for learning 3D from large amounts of 2D data. The model and code are available for research purposes at https://akanazawa.github.io/hmr/.This paper presents Human Mesh Recovery (HMR), an end-to-end framework for reconstructing a full 3D mesh of a human body from a single RGB image. Unlike most current methods that focus on 2D or 3D joint locations, HMR produces a richer and more useful mesh representation parameterized by shape and 3D joint angles. The main objective is to minimize the reprojection loss of keypoints, allowing the model to be trained using in-the-wild images with only 2D annotations. However, reprojection loss alone is highly under-constrained, so the authors introduce an adversary trained to distinguish real human bodies from synthetic ones. HMR can be trained with or without paired 2D-to-3D supervision and does not rely on intermediate 2D keypoint detection, directly inferring 3D pose and shape from image pixels. The model runs in real-time given a bounding box containing the person and outperforms previous optimization-based methods on 3D mesh reconstruction and is competitive on tasks like 3D joint location estimation and part segmentation. The framework uses the SMPL model, which parameterizes the mesh by 3D joint angles and a low-dimensional linear shape space. The model is trained end-to-end with an adversarial loss to ensure the output lies on the human body manifold. The 3D regression module iteratively infers the 3D human body and camera parameters that minimize the joint reprojection error. The inferred parameters are also sent to a discriminator network to determine if the 3D parameters correspond to real human bodies. This encourages the network to output parameters on the human manifold and acts as weak supervision. The model is evaluated on various in-the-wild images and outperforms previous methods in terms of 3D joint error and runtime. It also shows competitive results on tasks such as 3D joint location estimation and part segmentation. The model is trained on a large-scale dataset of 2D keypoint annotations and a separate dataset of 3D meshes of people with various poses and shapes. The adversarial prior helps regularize the model and ensures that the output is anthropometrically plausible. The model is trained in an end-to-end manner and can be used for real-time applications. The results show that the model can produce reasonable 3D reconstructions even without any paired 2D-to-3D supervision, opening up possibilities for learning 3D from large amounts of 2D data. The model and code are available for research purposes at https://akanazawa.github.io/hmr/.
Reach us at info@study.space