9 May 2016 | Joseph Redmon*, Santosh Divvala†, Ross Girshick*, Ali Farhadi*†
YOLO (You Only Look Once) is a unified, real-time object detection system that frames the task as a regression problem, predicting bounding boxes and class probabilities directly from full images. Unlike traditional methods that repurpose classifiers, YOLO uses a single neural network to optimize detection performance end-to-end. This approach results in extremely fast processing, with the base model running at 45 frames per second and a faster version achieving 155 frames per second while maintaining high accuracy. YOLO's unified architecture allows it to reason globally about the entire image, reducing localization errors and improving generalizability to new domains. Compared to state-of-the-art systems, YOLO makes fewer localization errors but has higher false positive rates on background. However, it outperforms other methods in generalizing from natural images to domains like artwork. The paper also discusses the limitations of YOLO, such as its struggle with small objects and objects in unusual aspect ratios, and provides a detailed comparison with other detection systems, highlighting its advantages and trade-offs.YOLO (You Only Look Once) is a unified, real-time object detection system that frames the task as a regression problem, predicting bounding boxes and class probabilities directly from full images. Unlike traditional methods that repurpose classifiers, YOLO uses a single neural network to optimize detection performance end-to-end. This approach results in extremely fast processing, with the base model running at 45 frames per second and a faster version achieving 155 frames per second while maintaining high accuracy. YOLO's unified architecture allows it to reason globally about the entire image, reducing localization errors and improving generalizability to new domains. Compared to state-of-the-art systems, YOLO makes fewer localization errors but has higher false positive rates on background. However, it outperforms other methods in generalizing from natural images to domains like artwork. The paper also discusses the limitations of YOLO, such as its struggle with small objects and objects in unusual aspect ratios, and provides a detailed comparison with other detection systems, highlighting its advantages and trade-offs.