Understanding VIBE%3A Video Inference for Human Body Pose and Shape Estimation

The paper "Video Inference for Human Body Pose and Shape Estimation" (VIBE) addresses the challenge of accurately estimating 3D human pose and shape from monocular video, a task that is difficult due to the lack of ground-truth 3D motion data. The authors propose a novel adversarial learning framework that leverages a large-scale motion-capture dataset (AMASS) and unpaired, in-the-wild 2D keypoint annotations. The key innovation is a temporal network architecture with a self-attention mechanism, which is trained to produce realistic and accurate pose and shape sequences. The model, VIBE, is evaluated on challenging 3D pose estimation datasets, achieving state-of-the-art performance. The method outperforms previous approaches by producing kinematically plausible motion sequences without relying on in-the-wild ground-truth 3D labels. The paper also includes extensive experiments to analyze the importance of motion and demonstrates the effectiveness of VIBE on various datasets.The paper "Video Inference for Human Body Pose and Shape Estimation" (VIBE) addresses the challenge of accurately estimating 3D human pose and shape from monocular video, a task that is difficult due to the lack of ground-truth 3D motion data. The authors propose a novel adversarial learning framework that leverages a large-scale motion-capture dataset (AMASS) and unpaired, in-the-wild 2D keypoint annotations. The key innovation is a temporal network architecture with a self-attention mechanism, which is trained to produce realistic and accurate pose and shape sequences. The model, VIBE, is evaluated on challenging 3D pose estimation datasets, achieving state-of-the-art performance. The method outperforms previous approaches by producing kinematically plausible motion sequences without relying on in-the-wild ground-truth 3D labels. The paper also includes extensive experiments to analyze the importance of motion and demonstrates the effectiveness of VIBE on various datasets.

VIBE: Video Inference for Human Body Pose and Shape Estimation

29 Apr 2020 | Muhammed Kocabas, Nikos Athanasiou, Michael J. Black