YOLO9000 is a real-time object detection system that can detect over 9000 object categories. The paper introduces YOLOv2, an improved version of the YOLO detection method, which is state-of-the-art on standard detection tasks like PASCAL VOC and COCO. YOLOv2 can run at varying sizes, offering an easy tradeoff between speed and accuracy. At 67 FPS, YOLOv2 achieves 76.8 mAP on VOC 2007, and at 40 FPS, it achieves 78.6 mAP, outperforming state-of-the-art methods like Faster R-CNN with ResNet and SSD while running significantly faster. The paper also proposes a method to jointly train on object detection and classification, allowing YOLO9000 to predict detections for object classes without labeled detection data. YOLO9000 achieves 19.7 mAP on the ImageNet detection validation set despite only having detection data for 44 of the 200 classes. On the 156 classes not in COCO, YOLO9000 achieves 16.0 mAP. YOLO9000 can detect over 9000 different object categories in real-time.
The paper introduces several improvements to the YOLO detection method, including batch normalization, a high-resolution classifier, convolutional with anchor boxes, dimension clusters, direct location prediction, and fine-grained features. These improvements enhance the performance and accuracy of YOLOv2. The paper also proposes a multi-scale training method that allows YOLOv2 to run at different resolutions, providing a tradeoff between speed and accuracy. Additionally, the paper introduces a new classification model, Darknet-19, which is used as the base of YOLOv2. The model is trained on the ImageNet dataset and achieves high accuracy.
The paper also proposes a hierarchical classification method using WordTree, which allows the combination of classification and detection data. This method enables YOLO9000 to detect over 9000 object categories in real-time. The paper evaluates YOLO9000 on the ImageNet detection task and finds that it achieves 19.7 mAP on the validation set, with 16.0 mAP on the 156 classes not in COCO. The paper also discusses the challenges of training on large-scale datasets and the benefits of using hierarchical classification for combining datasets. The paper concludes that YOLO9000 is a strong step towards closing the dataset size gap between detection and classification.YOLO9000 is a real-time object detection system that can detect over 9000 object categories. The paper introduces YOLOv2, an improved version of the YOLO detection method, which is state-of-the-art on standard detection tasks like PASCAL VOC and COCO. YOLOv2 can run at varying sizes, offering an easy tradeoff between speed and accuracy. At 67 FPS, YOLOv2 achieves 76.8 mAP on VOC 2007, and at 40 FPS, it achieves 78.6 mAP, outperforming state-of-the-art methods like Faster R-CNN with ResNet and SSD while running significantly faster. The paper also proposes a method to jointly train on object detection and classification, allowing YOLO9000 to predict detections for object classes without labeled detection data. YOLO9000 achieves 19.7 mAP on the ImageNet detection validation set despite only having detection data for 44 of the 200 classes. On the 156 classes not in COCO, YOLO9000 achieves 16.0 mAP. YOLO9000 can detect over 9000 different object categories in real-time.
The paper introduces several improvements to the YOLO detection method, including batch normalization, a high-resolution classifier, convolutional with anchor boxes, dimension clusters, direct location prediction, and fine-grained features. These improvements enhance the performance and accuracy of YOLOv2. The paper also proposes a multi-scale training method that allows YOLOv2 to run at different resolutions, providing a tradeoff between speed and accuracy. Additionally, the paper introduces a new classification model, Darknet-19, which is used as the base of YOLOv2. The model is trained on the ImageNet dataset and achieves high accuracy.
The paper also proposes a hierarchical classification method using WordTree, which allows the combination of classification and detection data. This method enables YOLO9000 to detect over 9000 object categories in real-time. The paper evaluates YOLO9000 on the ImageNet detection task and finds that it achieves 19.7 mAP on the validation set, with 16.0 mAP on the 156 classes not in COCO. The paper also discusses the challenges of training on large-scale datasets and the benefits of using hierarchical classification for combining datasets. The paper concludes that YOLO9000 is a strong step towards closing the dataset size gap between detection and classification.