This paper addresses the challenging task of real-time semantic segmentation on high-resolution images. The authors propose an Image Cascade Network (ICNet) that incorporates multi-resolution branches under proper label guidance to reduce computational complexity while maintaining decent quality. ICNet achieves real-time inference on a single GPU card, with a speedup of 5× and a reduction in memory consumption by 5×. The system is evaluated on datasets such as Cityscapes, CamVid, and COCO-Stuff, demonstrating high-quality results. The key contributions include the development of ICNet, a novel image cascade network that efficiently processes low-resolution images and leverages high-resolution details, and the introduction of the Cascade Feature Fusion Unit (CFF) and Cascade Label Guidance strategy to refine the segmentation prediction. The paper also provides a detailed analysis of the time budget in semantic segmentation frameworks and discusses the limitations of intuitive speedup strategies.This paper addresses the challenging task of real-time semantic segmentation on high-resolution images. The authors propose an Image Cascade Network (ICNet) that incorporates multi-resolution branches under proper label guidance to reduce computational complexity while maintaining decent quality. ICNet achieves real-time inference on a single GPU card, with a speedup of 5× and a reduction in memory consumption by 5×. The system is evaluated on datasets such as Cityscapes, CamVid, and COCO-Stuff, demonstrating high-quality results. The key contributions include the development of ICNet, a novel image cascade network that efficiently processes low-resolution images and leverages high-resolution details, and the introduction of the Cascade Feature Fusion Unit (CFF) and Cascade Label Guidance strategy to refine the segmentation prediction. The paper also provides a detailed analysis of the time budget in semantic segmentation frameworks and discusses the limitations of intuitive speedup strategies.