Understanding TokenHMR%3A Advancing Human Mesh Recovery with a Tokenized Pose Representation

TokenHMR is a novel method for 3D human pose and shape (HPS) estimation from a single image. It addresses the trade-off between 2D and 3D accuracy by introducing a new loss function, Threshold-Adaptive Loss Scaling (TALS), and a tokenized representation of 3D pose. TALS reduces the influence of 2D and pseudo-ground-truth (p-GT) errors that are less than the expected error due to the incorrect camera model. The tokenized pose representation restricts the estimated poses to the space of valid poses, improving robustness to occlusion. TokenHMR achieves state-of-the-art accuracy on multiple in-the-wild 3D benchmarks. The method uses a Vector Quantized-VAE (VQ-VAE) to discretize continuous human poses, providing a "vocabulary" of valid poses. This approach allows the model to learn a prior over valid poses, reducing bias and improving 3D accuracy. Experiments on the EMDB and 3DPW datasets show that TokenHMR outperforms existing methods, achieving a 7.6% reduction in 3D error compared to HMR2.0 on the EMDB dataset. The method is robust to ambiguous image evidence and avoids the "bent knees" bias seen in methods using p-GT and 2D keypoints. TokenHMR is available for research at https://tokenhmr.is.tue.mpg.de.TokenHMR is a novel method for 3D human pose and shape (HPS) estimation from a single image. It addresses the trade-off between 2D and 3D accuracy by introducing a new loss function, Threshold-Adaptive Loss Scaling (TALS), and a tokenized representation of 3D pose. TALS reduces the influence of 2D and pseudo-ground-truth (p-GT) errors that are less than the expected error due to the incorrect camera model. The tokenized pose representation restricts the estimated poses to the space of valid poses, improving robustness to occlusion. TokenHMR achieves state-of-the-art accuracy on multiple in-the-wild 3D benchmarks. The method uses a Vector Quantized-VAE (VQ-VAE) to discretize continuous human poses, providing a "vocabulary" of valid poses. This approach allows the model to learn a prior over valid poses, reducing bias and improving 3D accuracy. Experiments on the EMDB and 3DPW datasets show that TokenHMR outperforms existing methods, achieving a 7.6% reduction in 3D error compared to HMR2.0 on the EMDB dataset. The method is robust to ambiguous image evidence and avoids the "bent knees" bias seen in methods using p-GT and 2D keypoints. TokenHMR is available for research at https://tokenhmr.is.tue.mpg.de.

TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation

25 Apr 2024 | Sai Kumar Dwivedi, Yu Sun, Priyanka Patel, Yao Feng, Michael J. Black