[slides and audio] YOLOv10%3A Real-Time End-to-End Object Detection

YOLOv10: Real-Time End-to-End Object Detection **Authors:** Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zijia Lin, Jungong Han, Guiguang Ding **Institution:** Tsinghua University **Abstract:** This paper addresses the limitations of YOLOs in real-time object detection, particularly the reliance on non-maximum suppression (NMS) for post-processing, which hinders end-to-end deployment and increases inference latency. The authors propose a consistent dual assignments strategy for NMS-free training, allowing YOLOs to achieve competitive performance with high efficiency. They also introduce a holistic efficiency-accuracy driven model design strategy, optimizing various components of YOLOs to reduce computational overhead and enhance capability. The resulting YOLOv10 series demonstrates state-of-the-art performance and efficiency across different model scales, outperforming previous models in terms of latency and accuracy. **Key Contributions:** 1. **Consistent Dual Assignments for NMS-Free Training:** A dual label assignments strategy that provides rich supervision during training and high efficiency during inference, eliminating the need for NMS. 2. **Holistic Efficiency-Accuracy Driven Model Design:** Comprehensive optimization of YOLOs' components, including lightweight classification heads, spatial-channel decoupled downsampling, and rank-guided block design, to reduce computational redundancy and enhance efficiency. 3. **YOLOv10 Series:** A new generation of YOLOs for real-time end-to-end object detection, achieving superior performance and efficiency compared to other advanced detectors. **Experiments:** Extensive experiments on the COCO dataset show that YOLOv10 outperforms previous models in terms of latency and accuracy. For example, YOLOv10-S is 1.8× faster than RT-DETR-R18 with similar AP, while YOLOv10-B has 46% less latency and 25% fewer parameters compared to YOLOv9-C. YOLOv10-L and YOLOv10-X also achieve significant improvements over their predecessors. **Conclusion:** YOLOv10 advances the performance-efficiency boundary of YOLOs by addressing both post-processing and model architecture issues. The proposed methods and YOLOv10 series demonstrate superior real-time end-to-end object detection capabilities, making it a significant contribution to the field.YOLOv10: Real-Time End-to-End Object Detection **Authors:** Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zijia Lin, Jungong Han, Guiguang Ding **Institution:** Tsinghua University **Abstract:** This paper addresses the limitations of YOLOs in real-time object detection, particularly the reliance on non-maximum suppression (NMS) for post-processing, which hinders end-to-end deployment and increases inference latency. The authors propose a consistent dual assignments strategy for NMS-free training, allowing YOLOs to achieve competitive performance with high efficiency. They also introduce a holistic efficiency-accuracy driven model design strategy, optimizing various components of YOLOs to reduce computational overhead and enhance capability. The resulting YOLOv10 series demonstrates state-of-the-art performance and efficiency across different model scales, outperforming previous models in terms of latency and accuracy. **Key Contributions:** 1. **Consistent Dual Assignments for NMS-Free Training:** A dual label assignments strategy that provides rich supervision during training and high efficiency during inference, eliminating the need for NMS. 2. **Holistic Efficiency-Accuracy Driven Model Design:** Comprehensive optimization of YOLOs' components, including lightweight classification heads, spatial-channel decoupled downsampling, and rank-guided block design, to reduce computational redundancy and enhance efficiency. 3. **YOLOv10 Series:** A new generation of YOLOs for real-time end-to-end object detection, achieving superior performance and efficiency compared to other advanced detectors. **Experiments:** Extensive experiments on the COCO dataset show that YOLOv10 outperforms previous models in terms of latency and accuracy. For example, YOLOv10-S is 1.8× faster than RT-DETR-R18 with similar AP, while YOLOv10-B has 46% less latency and 25% fewer parameters compared to YOLOv9-C. YOLOv10-L and YOLOv10-X also achieve significant improvements over their predecessors. **Conclusion:** YOLOv10 advances the performance-efficiency boundary of YOLOs by addressing both post-processing and model architecture issues. The proposed methods and YOLOv10 series demonstrate superior real-time end-to-end object detection capabilities, making it a significant contribution to the field.

YOLOv10: Real-Time End-to-End Object Detection

23 May 2024 | Ao Wang, Hui Chen*, Lihao Liu, Kai Chen, Zijia Lin, Jungong Han, Guiguang Ding*

23 May 2024 | Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zijia Lin, Jungong Han, Guiguang Ding