Cascaded Pyramid Network for Multi-Person Pose Estimation

Cascaded Pyramid Network for Multi-Person Pose Estimation

8 Apr 2018 | Yilun Chen*, Zhicheng Wang*, Yuxiang Peng1, Zhiqiang Zhang2, Gang Yu, Jian Sun
The paper presents a novel network structure called Cascaded Pyramid Network (CPN) for multi-person pose estimation, addressing challenging cases such as occluded and invisible keypoints. CPN consists of two stages: GlobalNet and RefineNet. GlobalNet, a feature pyramid network, localizes simple keypoints like eyes and hands but may fail for occluded or invisible keypoints. RefineNet explicitly handles these "hard" keypoints by integrating all levels of feature representations from GlobalNet and using an online hard keypoint mining loss. The algorithm follows a top-down pipeline, first generating human bounding boxes using a detector, then localizing key points within each box using CPN. The method achieves state-of-the-art results on the COCO keypoint benchmark, with average precision (AP) of 73.0 on test-dev and 72.1 on test-challenge datasets, outperforming the COCO 2016 keypoint challenge winner by 19%. The paper also explores the impact of various factors affecting multi-person pose estimation, including person detectors and data preprocessing.The paper presents a novel network structure called Cascaded Pyramid Network (CPN) for multi-person pose estimation, addressing challenging cases such as occluded and invisible keypoints. CPN consists of two stages: GlobalNet and RefineNet. GlobalNet, a feature pyramid network, localizes simple keypoints like eyes and hands but may fail for occluded or invisible keypoints. RefineNet explicitly handles these "hard" keypoints by integrating all levels of feature representations from GlobalNet and using an online hard keypoint mining loss. The algorithm follows a top-down pipeline, first generating human bounding boxes using a detector, then localizing key points within each box using CPN. The method achieves state-of-the-art results on the COCO keypoint benchmark, with average precision (AP) of 73.0 on test-dev and 72.1 on test-challenge datasets, outperforming the COCO 2016 keypoint challenge winner by 19%. The paper also explores the impact of various factors affecting multi-person pose estimation, including person detectors and data preprocessing.
Reach us at info@study.space
Understanding Cascaded Pyramid Network for Multi-person Pose Estimation