Understanding YOLO9000%3A Better%2C Faster%2C Stronger

YOLO9000 is a state-of-the-art, real-time object detection system capable of identifying over 9000 object categories. The paper introduces YOLOv2, an improved version of the original YOLO detection method, which achieves state-of-the-art performance on standard detection tasks like PASCAL VOC and COCO. YOLOv2 uses a multi-scale training method, allowing it to run at varying sizes and offering a tradeoff between speed and accuracy. At 67 FPS, YOLOv2 achieves 76.8 mAP on VOC 2007, and at 40 FPS, it achieves 78.6 mAP, outperforming methods like Faster R-CNN with ResNet and SSD while maintaining significantly faster speeds. The paper also proposes a method for jointly training on object detection and classification data. By training YOLO9000 simultaneously on the COCO detection dataset and the ImageNet classification dataset, the model can predict detections for object classes that lack labeled detection data. This approach is validated on the ImageNet detection task, where YOLO9000 achieves 19.7 mAP on the validation set despite only having detection data for 44 of the 200 classes. On the 156 classes not in COCO, YOLO9000 achieves 16.0 mAP. Key improvements in YOLOv2 include batch normalization, a high-resolution classifier, the use of anchor boxes, dimension clusters, and multi-scale training. These enhancements improve recall, localization, and overall accuracy while maintaining real-time performance. The paper also introduces Darknet-19, a new classification model that is faster and more accurate than VGG-16, serving as the base for YOLOv2. The joint training approach leverages a hierarchical classification model called WordTree, which combines datasets by mapping categories to synsets in the WordNet graph. This allows YOLO9000 to learn from both detection and classification data, expanding its ability to detect a wide range of objects. The paper concludes by discussing the generalizability of these techniques and future directions, including weakly supervised image segmentation and improving detection results using more powerful matching strategies.YOLO9000 is a state-of-the-art, real-time object detection system capable of identifying over 9000 object categories. The paper introduces YOLOv2, an improved version of the original YOLO detection method, which achieves state-of-the-art performance on standard detection tasks like PASCAL VOC and COCO. YOLOv2 uses a multi-scale training method, allowing it to run at varying sizes and offering a tradeoff between speed and accuracy. At 67 FPS, YOLOv2 achieves 76.8 mAP on VOC 2007, and at 40 FPS, it achieves 78.6 mAP, outperforming methods like Faster R-CNN with ResNet and SSD while maintaining significantly faster speeds. The paper also proposes a method for jointly training on object detection and classification data. By training YOLO9000 simultaneously on the COCO detection dataset and the ImageNet classification dataset, the model can predict detections for object classes that lack labeled detection data. This approach is validated on the ImageNet detection task, where YOLO9000 achieves 19.7 mAP on the validation set despite only having detection data for 44 of the 200 classes. On the 156 classes not in COCO, YOLO9000 achieves 16.0 mAP. Key improvements in YOLOv2 include batch normalization, a high-resolution classifier, the use of anchor boxes, dimension clusters, and multi-scale training. These enhancements improve recall, localization, and overall accuracy while maintaining real-time performance. The paper also introduces Darknet-19, a new classification model that is faster and more accurate than VGG-16, serving as the base for YOLOv2. The joint training approach leverages a hierarchical classification model called WordTree, which combines datasets by mapping categories to synsets in the WordNet graph. This allows YOLO9000 to learn from both detection and classification data, expanding its ability to detect a wide range of objects. The paper concludes by discussing the generalizability of these techniques and future directions, including weakly supervised image segmentation and improving detection results using more powerful matching strategies.

YOLO9000: Better, Faster, Stronger

25 Dec 2016 | Joseph Redmon, Ali Farhadi†

YOLO9000: Better, Faster, Stronger

25 Dec 2016 | Joseph Redmon*, Ali Farhadi*†

25 Dec 2016 | Joseph Redmon, Ali Farhadi†