[slides and audio] 3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training

This paper presents a fully convolutional model for 3D human pose estimation in videos, utilizing dilated temporal convolutions over 2D keypoint trajectories. The model outperforms previous state-of-the-art methods by 6 mm mean per-joint position error on the Human3.6M dataset, representing an 11% reduction in error. Additionally, the model shows significant improvements on the HumanEva-I dataset. The paper introduces back-projection, a semi-supervised training method that leverages unlabeled video data. This method only requires camera intrinsic parameters and does not need ground-truth 2D annotations or multi-view imagery with extrinsic camera parameters. The approach is effective in scenarios where labeled data is scarce, demonstrating improvements of about 10 mm N-MPJPE (15 mm MPJPE over a strong baseline) when less than 5K annotated frames are available. The code and models are available at <https://github.com/facebookresearch/VideoPose3D>.This paper presents a fully convolutional model for 3D human pose estimation in videos, utilizing dilated temporal convolutions over 2D keypoint trajectories. The model outperforms previous state-of-the-art methods by 6 mm mean per-joint position error on the Human3.6M dataset, representing an 11% reduction in error. Additionally, the model shows significant improvements on the HumanEva-I dataset. The paper introduces back-projection, a semi-supervised training method that leverages unlabeled video data. This method only requires camera intrinsic parameters and does not need ground-truth 2D annotations or multi-view imagery with extrinsic camera parameters. The approach is effective in scenarios where labeled data is scarce, demonstrating improvements of about 10 mm N-MPJPE (15 mm MPJPE over a strong baseline) when less than 5K annotated frames are available. The code and models are available at <https://github.com/facebookresearch/VideoPose3D>.

3D human pose estimation in video with temporal convolutions and semi-supervised training

29 Mar 2019 | Dario Pavllo, Christoph Feichtenhofer, David Grangier, Michael Auli

3D human pose estimation in video with temporal convolutions and semi-supervised training

29 Mar 2019 | Dario Pavllo*, Christoph Feichtenhofer, David Grangier*, Michael Auli

29 Mar 2019 | Dario Pavllo, Christoph Feichtenhofer, David Grangier, Michael Auli