11 Apr 2019 | Georgios Pavlakos*1,2, Vasileios Choutas*1, Nima Ghorbani1, Timo Bolkart1, Ahmed A. A. Osman1, Dimitrios Tzionas1, and Michael J. Black1
The paper presents a method to capture 3D human body pose, hand pose, and facial expression from a single monocular image. The authors develop a new unified 3D model called SMPL-X, which extends the SMPL model to include fully articulated hands and an expressive face. To fit SMPL-X to images, they propose SMPLify-X, an optimization-based approach that estimates 2D features and then fits the model parameters to these features. Key contributions include:
1. **2D Feature Detection**: They detect 2D features corresponding to the face, hands, and feet.
2. **Neural Network Pose Prior**: They train a new neural network pose prior using a large MoCap dataset.
3. **Interpenetration Penalty**: They define a new interpenetration penalty that is both fast and accurate.
4. **Gender Detection**: They automatically detect gender and use appropriate body models (male, female, or neutral).
5. **Efficient Implementation**: They achieve a speedup of more than 8× over Chumpy using PyTorch.
The method is evaluated on a new curated dataset with pseudo ground-truth, showing significant improvements over related models. The authors believe this work is a significant step towards expressive capture of bodies, hands, and faces from a single RGB image. The models, code, and data are available for research purposes.The paper presents a method to capture 3D human body pose, hand pose, and facial expression from a single monocular image. The authors develop a new unified 3D model called SMPL-X, which extends the SMPL model to include fully articulated hands and an expressive face. To fit SMPL-X to images, they propose SMPLify-X, an optimization-based approach that estimates 2D features and then fits the model parameters to these features. Key contributions include:
1. **2D Feature Detection**: They detect 2D features corresponding to the face, hands, and feet.
2. **Neural Network Pose Prior**: They train a new neural network pose prior using a large MoCap dataset.
3. **Interpenetration Penalty**: They define a new interpenetration penalty that is both fast and accurate.
4. **Gender Detection**: They automatically detect gender and use appropriate body models (male, female, or neutral).
5. **Efficient Implementation**: They achieve a speedup of more than 8× over Chumpy using PyTorch.
The method is evaluated on a new curated dataset with pseudo ground-truth, showing significant improvements over related models. The authors believe this work is a significant step towards expressive capture of bodies, hands, and faces from a single RGB image. The models, code, and data are available for research purposes.