Understanding YOLOv7%3A Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors

YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao Institute of Information Science, Academia Sinica, Taiwan YOLOv7 surpasses all known object detectors in both speed and accuracy, achieving the highest accuracy of 56.8% AP at 30 FPS or higher on GPU V100. YOLOv7-E6, with 56 FPS on V100 and 55.9% AP, outperforms transformer-based SWIN-L Cascade-Mask R-CNN (9.2 FPS, 53.9% AP) by 509% in speed and 2% in accuracy, and convolutional-based ConvNeXt-XL Cascade-Mask R-CNN (8.6 FPS, 55.2% AP) by 551% in speed and 0.7% AP in accuracy. YOLOv7 also outperforms other detectors like YOLOR, YOLOX, Scaled-YOLOv4, YOLOv5, DETR, Deformable DETR, DINO-5scale-R50, ViT-Adapter-B, and many others in speed and accuracy. The model is trained only on the MS COCO dataset from scratch without using any pre-trained weights. The paper introduces several trainable bag-of-freebies methods to enhance the accuracy of real-time object detection without increasing inference cost. These methods include model re-parameterization, dynamic label assignment, and compound scaling. The proposed methods significantly reduce parameters and computation while improving inference speed and detection accuracy. The authors also present an extended efficient layer aggregation network (E-ELAN) that maintains the original gradient path while enhancing feature learning. The experimental results demonstrate the effectiveness of these methods, showing that YOLOv7 achieves state-of-the-art performance in various benchmarks.YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao Institute of Information Science, Academia Sinica, Taiwan YOLOv7 surpasses all known object detectors in both speed and accuracy, achieving the highest accuracy of 56.8% AP at 30 FPS or higher on GPU V100. YOLOv7-E6, with 56 FPS on V100 and 55.9% AP, outperforms transformer-based SWIN-L Cascade-Mask R-CNN (9.2 FPS, 53.9% AP) by 509% in speed and 2% in accuracy, and convolutional-based ConvNeXt-XL Cascade-Mask R-CNN (8.6 FPS, 55.2% AP) by 551% in speed and 0.7% AP in accuracy. YOLOv7 also outperforms other detectors like YOLOR, YOLOX, Scaled-YOLOv4, YOLOv5, DETR, Deformable DETR, DINO-5scale-R50, ViT-Adapter-B, and many others in speed and accuracy. The model is trained only on the MS COCO dataset from scratch without using any pre-trained weights. The paper introduces several trainable bag-of-freebies methods to enhance the accuracy of real-time object detection without increasing inference cost. These methods include model re-parameterization, dynamic label assignment, and compound scaling. The proposed methods significantly reduce parameters and computation while improving inference speed and detection accuracy. The authors also present an extended efficient layer aggregation network (E-ELAN) that maintains the original gradient path while enhancing feature learning. The experimental results demonstrate the effectiveness of these methods, showing that YOLOv7 achieves state-of-the-art performance in various benchmarks.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

6 Jul 2022 | Chien-Yao Wang1, Alexey Bochkovskiy, and Hong-Yuan Mark Liao1