Deep High-Resolution Representation Learning for Human Pose Estimation

Deep High-Resolution Representation Learning for Human Pose Estimation

25 Feb 2019 | Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang
This paper proposes a high-resolution network (HRNet) for human pose estimation, which maintains high-resolution representations throughout the entire process. Unlike existing methods that recover high-resolution representations from low-resolution ones, HRNet uses parallel high-to-low resolution subnetworks to generate rich high-resolution representations through repeated multi-scale fusions. This approach leads to more accurate and spatially precise keypoint heatmaps. The network is evaluated on two benchmark datasets: COCO keypoint detection and MPII Human Pose, achieving superior performance. Additionally, it shows effectiveness in pose tracking on the PoseTrack dataset. The HRNet architecture is designed to maintain high-resolution representations through parallel subnetworks and repeated multi-scale fusions, which enhances the accuracy of keypoint detection. The network is implemented with four stages and four parallel subnetworks, with resolutions gradually decreasing and widths increasing. The HRNet-W32 and HRNet-W48 models are tested on various datasets, achieving high performance in keypoint detection and pose tracking. The network is also efficient in terms of computation complexity and parameters. The results show that HRNet outperforms existing methods in keypoint detection and pose tracking, demonstrating the effectiveness of maintaining high-resolution representations throughout the entire process. The network is available for public use and further research.This paper proposes a high-resolution network (HRNet) for human pose estimation, which maintains high-resolution representations throughout the entire process. Unlike existing methods that recover high-resolution representations from low-resolution ones, HRNet uses parallel high-to-low resolution subnetworks to generate rich high-resolution representations through repeated multi-scale fusions. This approach leads to more accurate and spatially precise keypoint heatmaps. The network is evaluated on two benchmark datasets: COCO keypoint detection and MPII Human Pose, achieving superior performance. Additionally, it shows effectiveness in pose tracking on the PoseTrack dataset. The HRNet architecture is designed to maintain high-resolution representations through parallel subnetworks and repeated multi-scale fusions, which enhances the accuracy of keypoint detection. The network is implemented with four stages and four parallel subnetworks, with resolutions gradually decreasing and widths increasing. The HRNet-W32 and HRNet-W48 models are tested on various datasets, achieving high performance in keypoint detection and pose tracking. The network is also efficient in terms of computation complexity and parameters. The results show that HRNet outperforms existing methods in keypoint detection and pose tracking, demonstrating the effectiveness of maintaining high-resolution representations throughout the entire process. The network is available for public use and further research.
Reach us at info@study.space
[slides and audio] Deep High-Resolution Representation Learning for Human Pose Estimation