Cascaded Pyramid Network for Multi-Person Pose Estimation

Cascaded Pyramid Network for Multi-Person Pose Estimation

8 Apr 2018 | Yilun Chen*, Zhicheng Wang*, Yuxiang Peng1, Zhiqiang Zhang2, Gang Yu, Jian Sun
This paper proposes a novel network structure called Cascaded Pyramid Network (CPN) for multi-person pose estimation. The CPN consists of two stages: GlobalNet and RefineNet. GlobalNet is a feature pyramid network that can effectively localize simple keypoints like eyes and hands but may fail to precisely recognize occluded or invisible keypoints. RefineNet explicitly addresses the "hard" keypoints by integrating all levels of feature representations from GlobalNet and using an online hard keypoint mining loss. The CPN is used in a top-down pipeline to first generate human bounding boxes using a detector, followed by keypoint localization in each bounding box. The proposed algorithm achieves state-of-the-art results on the COCO keypoint benchmark, with average precision of 73.0 on the COCO test-dev dataset and 72.1 on the COCO test-challenge dataset, representing a 19% relative improvement over the COCO 2016 keypoint challenge. The CPN is effective in handling challenging cases such as occluded and invisible keypoints, and the algorithm's performance is further improved by online hard keypoint mining. The paper also explores the effects of various factors that may affect the performance of multi-person pose estimation, including person detector and data preprocessing. The results show that the CPN achieves significant improvements in accuracy and robustness compared to existing methods.This paper proposes a novel network structure called Cascaded Pyramid Network (CPN) for multi-person pose estimation. The CPN consists of two stages: GlobalNet and RefineNet. GlobalNet is a feature pyramid network that can effectively localize simple keypoints like eyes and hands but may fail to precisely recognize occluded or invisible keypoints. RefineNet explicitly addresses the "hard" keypoints by integrating all levels of feature representations from GlobalNet and using an online hard keypoint mining loss. The CPN is used in a top-down pipeline to first generate human bounding boxes using a detector, followed by keypoint localization in each bounding box. The proposed algorithm achieves state-of-the-art results on the COCO keypoint benchmark, with average precision of 73.0 on the COCO test-dev dataset and 72.1 on the COCO test-challenge dataset, representing a 19% relative improvement over the COCO 2016 keypoint challenge. The CPN is effective in handling challenging cases such as occluded and invisible keypoints, and the algorithm's performance is further improved by online hard keypoint mining. The paper also explores the effects of various factors that may affect the performance of multi-person pose estimation, including person detector and data preprocessing. The results show that the CPN achieves significant improvements in accuracy and robustness compared to existing methods.
Reach us at info@study.space