8 January 2024 | Yang Cui, Dong Guo, Hao Yuan, Hengzhi Gu and Hongbo Tang
This paper presents an enhanced version of the You Only Look Once (YOLO) network for improving the efficiency of traffic sign detection. The research focuses on 72 distinct traffic signs prevalent in urban roads in China. The modifications include removing the terminal convolution module and Conv3 (C3) module within the backbone network, replacing 32-fold downsampling with 16-fold downsampling, and introducing a 152 × 152 feature fusion module. A novel hybrid spatial pyramid pooling module, called Hybrid Spatial Pyramid Pooling Fast (H-SPPF), is introduced to capture a more comprehensive context. Additionally, a channel attention mechanism is integrated into the framework. The enhanced algorithm achieves impressive results, with a precision rate of 91.72%, a recall rate of 91.77%, and a mean average precision (mAP) of 93.88% at an intersection over union (IoU) threshold of 0.5. It also achieves an mAP of 75.81% for various IoU criteria between 0.5 and 0.95. These achievements are validated on an augmented dataset established for this study. The paper discusses the improvements in the K-means clustering algorithm, the multi-scale feature fusion structure, the H-SPPF module, and the channel attention mechanism, providing detailed experimental results and comparisons with the standard YOLOv5s model. The enhanced model demonstrates superior performance in detecting small traffic signs and robustness in challenging conditions.This paper presents an enhanced version of the You Only Look Once (YOLO) network for improving the efficiency of traffic sign detection. The research focuses on 72 distinct traffic signs prevalent in urban roads in China. The modifications include removing the terminal convolution module and Conv3 (C3) module within the backbone network, replacing 32-fold downsampling with 16-fold downsampling, and introducing a 152 × 152 feature fusion module. A novel hybrid spatial pyramid pooling module, called Hybrid Spatial Pyramid Pooling Fast (H-SPPF), is introduced to capture a more comprehensive context. Additionally, a channel attention mechanism is integrated into the framework. The enhanced algorithm achieves impressive results, with a precision rate of 91.72%, a recall rate of 91.77%, and a mean average precision (mAP) of 93.88% at an intersection over union (IoU) threshold of 0.5. It also achieves an mAP of 75.81% for various IoU criteria between 0.5 and 0.95. These achievements are validated on an augmented dataset established for this study. The paper discusses the improvements in the K-means clustering algorithm, the multi-scale feature fusion structure, the H-SPPF module, and the channel attention mechanism, providing detailed experimental results and comparisons with the standard YOLOv5s model. The enhanced model demonstrates superior performance in detecting small traffic signs and robustness in challenging conditions.