29 Feb 2024 | Chien-Yao Wang, I-Hau Yeh, and Hong-Yuan Mark Liao
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
This paper introduces YOLOv9, a new object detection system that addresses the issue of information loss in deep neural networks. The authors propose the concept of programmable gradient information (PGI) to ensure that the input information is fully retained for the target task, enabling reliable gradient information for network weight updates. A new lightweight network architecture, Generalized Efficient Layer Aggregation Network (GELAN), is also introduced, which is based on gradient path planning. GELAN's architecture confirms that PGI achieves superior results on lightweight models. The proposed GELAN and PGI are verified on the MS COCO dataset for object detection. Results show that GELAN achieves better parameter utilization than state-of-the-art methods based on depth-wise convolution. PGI can be used for various models from lightweight to large, enabling train-from-scratch models to achieve better results than pre-trained models. The source code is available at https://github.com/WongKinYiu/yolov9.
The paper also discusses the problem of information loss in deep networks, which can lead to biased gradient flows and incorrect associations between targets and inputs. The authors propose PGI to generate reliable gradients through an auxiliary reversible branch, allowing deep features to maintain key characteristics for the target task. The design of the auxiliary reversible branch avoids semantic loss that may occur in traditional deep supervision processes. PGI is applied to various network sizes and is more general than deep supervision, which is only suitable for very deep networks.
The authors also design GELAN based on ELAN, which considers the number of parameters, computational complexity, accuracy, and inference speed. GELAN allows users to choose appropriate computational blocks for different inference devices. The proposed PGI and GELAN are combined to design a new generation of YOLO series object detection system, YOLOv9. Experiments on the MS COCO dataset show that YOLOv9 achieves the top performance in all comparisons.
The paper also discusses related work, including real-time object detectors, reversible architectures, and auxiliary supervision. The authors analyze the information bottleneck principle and reversible functions, which are key to solving the problem of information loss in deep networks. The proposed PGI and GELAN are shown to be effective in improving the performance of object detection models. The results show that YOLOv9 outperforms existing real-time object detectors in terms of accuracy, parameter usage, and computational efficiency. The paper concludes that the proposed method provides a new approach to deep neural network training that can generate reliable gradients and is suitable for shallow and lightweight neural networks.YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
This paper introduces YOLOv9, a new object detection system that addresses the issue of information loss in deep neural networks. The authors propose the concept of programmable gradient information (PGI) to ensure that the input information is fully retained for the target task, enabling reliable gradient information for network weight updates. A new lightweight network architecture, Generalized Efficient Layer Aggregation Network (GELAN), is also introduced, which is based on gradient path planning. GELAN's architecture confirms that PGI achieves superior results on lightweight models. The proposed GELAN and PGI are verified on the MS COCO dataset for object detection. Results show that GELAN achieves better parameter utilization than state-of-the-art methods based on depth-wise convolution. PGI can be used for various models from lightweight to large, enabling train-from-scratch models to achieve better results than pre-trained models. The source code is available at https://github.com/WongKinYiu/yolov9.
The paper also discusses the problem of information loss in deep networks, which can lead to biased gradient flows and incorrect associations between targets and inputs. The authors propose PGI to generate reliable gradients through an auxiliary reversible branch, allowing deep features to maintain key characteristics for the target task. The design of the auxiliary reversible branch avoids semantic loss that may occur in traditional deep supervision processes. PGI is applied to various network sizes and is more general than deep supervision, which is only suitable for very deep networks.
The authors also design GELAN based on ELAN, which considers the number of parameters, computational complexity, accuracy, and inference speed. GELAN allows users to choose appropriate computational blocks for different inference devices. The proposed PGI and GELAN are combined to design a new generation of YOLO series object detection system, YOLOv9. Experiments on the MS COCO dataset show that YOLOv9 achieves the top performance in all comparisons.
The paper also discusses related work, including real-time object detectors, reversible architectures, and auxiliary supervision. The authors analyze the information bottleneck principle and reversible functions, which are key to solving the problem of information loss in deep networks. The proposed PGI and GELAN are shown to be effective in improving the performance of object detection models. The results show that YOLOv9 outperforms existing real-time object detectors in terms of accuracy, parameter usage, and computational efficiency. The paper concludes that the proposed method provides a new approach to deep neural network training that can generate reliable gradients and is suitable for shallow and lightweight neural networks.