9 Jun 2017 | Alejandro Newell, Zhiao Huang*, Jia Deng
The paper introduces associative embedding, a novel method for supervising convolutional neural networks to perform joint detection and grouping tasks. This approach is applicable to various computer vision problems, including multi-person pose estimation, instance segmentation, and multi-object tracking. The key idea is to introduce real numbers (tags) for each detection to identify the group it belongs to, allowing the network to output both detection scores and grouping assignments simultaneously. The method is integrated into state-of-the-art network architectures that produce pixel-wise predictions, such as the stacked hourglass network. The authors demonstrate the effectiveness of associative embedding by achieving state-of-the-art performance on multi-person pose estimation datasets (MPII and MS-COCO) and showing preliminary results on instance segmentation. The method is flexible and can be easily adapted to other vision tasks, making it a valuable contribution to the field of computer vision.The paper introduces associative embedding, a novel method for supervising convolutional neural networks to perform joint detection and grouping tasks. This approach is applicable to various computer vision problems, including multi-person pose estimation, instance segmentation, and multi-object tracking. The key idea is to introduce real numbers (tags) for each detection to identify the group it belongs to, allowing the network to output both detection scores and grouping assignments simultaneously. The method is integrated into state-of-the-art network architectures that produce pixel-wise predictions, such as the stacked hourglass network. The authors demonstrate the effectiveness of associative embedding by achieving state-of-the-art performance on multi-person pose estimation datasets (MPII and MS-COCO) and showing preliminary results on instance segmentation. The method is flexible and can be easily adapted to other vision tasks, making it a valuable contribution to the field of computer vision.