YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

6 Jul 2022 | Chien-Yao Wang1, Alexey Bochkovskiy, and Hong-Yuan Mark Liao1
YOLOv7 is a state-of-the-art real-time object detector that outperforms existing methods in both speed and accuracy. It achieves a 56.8% AP on the COCO dataset at 30 FPS using a GPU. YOLOv7-E6, with 56 FPS, achieves 55.9% AP, surpassing other detectors like SWIN-L Cascade-Mask R-CNN and ConvNeXt-XL Cascade-Mask R-CNN in speed and accuracy. YOLOv7 is trained from scratch on the COCO dataset without using pre-trained weights or other datasets. The paper introduces "trainable bag-of-freebies," which are optimized modules and methods that enhance detection accuracy without increasing inference cost. These include planned re-parameterized convolution, coarse-to-fine lead guided label assignment, and compound scaling methods. YOLOv7 also improves model efficiency by reducing parameters and computation by up to 40% and 50%, respectively. The architecture includes Extended Efficient Layer Aggregation Networks (E-ELAN), which enhances feature learning and parameter utilization. The paper also presents ablation studies showing the effectiveness of these methods. YOLOv7 outperforms other detectors like YOLO, YOLOX, YOLOv5, and DINO-5scale-R50 in speed and accuracy. The proposed methods enable efficient training and inference on various hardware platforms, including edge and cloud GPUs. YOLOv7 achieves state-of-the-art results in real-time object detection, demonstrating superior performance across different speed and accuracy trade-offs.YOLOv7 is a state-of-the-art real-time object detector that outperforms existing methods in both speed and accuracy. It achieves a 56.8% AP on the COCO dataset at 30 FPS using a GPU. YOLOv7-E6, with 56 FPS, achieves 55.9% AP, surpassing other detectors like SWIN-L Cascade-Mask R-CNN and ConvNeXt-XL Cascade-Mask R-CNN in speed and accuracy. YOLOv7 is trained from scratch on the COCO dataset without using pre-trained weights or other datasets. The paper introduces "trainable bag-of-freebies," which are optimized modules and methods that enhance detection accuracy without increasing inference cost. These include planned re-parameterized convolution, coarse-to-fine lead guided label assignment, and compound scaling methods. YOLOv7 also improves model efficiency by reducing parameters and computation by up to 40% and 50%, respectively. The architecture includes Extended Efficient Layer Aggregation Networks (E-ELAN), which enhances feature learning and parameter utilization. The paper also presents ablation studies showing the effectiveness of these methods. YOLOv7 outperforms other detectors like YOLO, YOLOX, YOLOv5, and DINO-5scale-R50 in speed and accuracy. The proposed methods enable efficient training and inference on various hardware platforms, including edge and cloud GPUs. YOLOv7 achieves state-of-the-art results in real-time object detection, demonstrating superior performance across different speed and accuracy trade-offs.
Reach us at info@study.space
[slides and audio] YOLOv7%3A Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors