29 Jul 2024 | Chun-Lin Ji, Tao Yu, Peng Gao, Fei Wang, Ru-Yue Yuan
The paper introduces YOLO-TLA, an advanced object detection model based on YOLOv5, designed to improve the detection of small objects and reduce model complexity. The key contributions include:
1. **Tiny Object Detection Layer**: An additional detection layer is added to the neck network to enhance the detection of small objects by producing a larger feature map.
2. **C3CrossConv Module**: Integrated into the backbone network to minimize computational demand and parameter count, improving feature extraction capabilities.
3. **Global Attention Mechanism (GAM)**: Applied to the backbone network to combine channel and global information, enhancing the model's ability to focus on objects of interest.
The proposed model, YOLO-TLA, shows significant improvements over the baseline YOLOv5s on the MS COCO dataset, with a 4.6% increase in mAP@0.5 and 4% in mAP@0.5:0.95, while keeping the model size compact at 9.49M parameters. For the larger YOLOv5m model, the enhanced version, YOLO-TLAm, achieves a 1.7% and 1.9% increase in mAP@0.5 and mAP@0.5:0.95, respectively, with 27.53M parameters.
The paper also compares YOLO-TLA with other state-of-the-art models, demonstrating its superior performance in both accuracy and efficiency. The results validate the effectiveness of the proposed improvements in enhancing small object detection while maintaining high accuracy and reducing computational demands.The paper introduces YOLO-TLA, an advanced object detection model based on YOLOv5, designed to improve the detection of small objects and reduce model complexity. The key contributions include:
1. **Tiny Object Detection Layer**: An additional detection layer is added to the neck network to enhance the detection of small objects by producing a larger feature map.
2. **C3CrossConv Module**: Integrated into the backbone network to minimize computational demand and parameter count, improving feature extraction capabilities.
3. **Global Attention Mechanism (GAM)**: Applied to the backbone network to combine channel and global information, enhancing the model's ability to focus on objects of interest.
The proposed model, YOLO-TLA, shows significant improvements over the baseline YOLOv5s on the MS COCO dataset, with a 4.6% increase in mAP@0.5 and 4% in mAP@0.5:0.95, while keeping the model size compact at 9.49M parameters. For the larger YOLOv5m model, the enhanced version, YOLO-TLAm, achieves a 1.7% and 1.9% increase in mAP@0.5 and mAP@0.5:0.95, respectively, with 27.53M parameters.
The paper also compares YOLO-TLA with other state-of-the-art models, demonstrating its superior performance in both accuracy and efficiency. The results validate the effectiveness of the proposed improvements in enhancing small object detection while maintaining high accuracy and reducing computational demands.