This paper proposes a new framework called Hybrid Task Cascade (HTC) for instance segmentation. HTC improves upon existing methods by interweaving detection and segmentation tasks in a multi-stage processing pipeline, allowing for better information flow between tasks. The framework also incorporates a fully convolutional branch to provide spatial context, which helps distinguish hard foreground objects from cluttered backgrounds. HTC achieves state-of-the-art performance on the COCO dataset, with a mask AP of 48.6 on the test-challenge split and 49.0 on the test-dev split. The framework is trained end-to-end and is effective across different backbones and common components such as deformable convolution, multi-scale training, and model ensembling. HTC outperforms existing methods, achieving 1.5% improvement over a strong Cascade Mask R-CNN baseline on the MSCOCO dataset. The framework is also effective in providing spatial context through a semantic segmentation branch, which enhances the performance of both bounding box and mask predictions. The paper also presents an extensive study of various components and designs, demonstrating the effectiveness of HTC in instance segmentation.This paper proposes a new framework called Hybrid Task Cascade (HTC) for instance segmentation. HTC improves upon existing methods by interweaving detection and segmentation tasks in a multi-stage processing pipeline, allowing for better information flow between tasks. The framework also incorporates a fully convolutional branch to provide spatial context, which helps distinguish hard foreground objects from cluttered backgrounds. HTC achieves state-of-the-art performance on the COCO dataset, with a mask AP of 48.6 on the test-challenge split and 49.0 on the test-dev split. The framework is trained end-to-end and is effective across different backbones and common components such as deformable convolution, multi-scale training, and model ensembling. HTC outperforms existing methods, achieving 1.5% improvement over a strong Cascade Mask R-CNN baseline on the MSCOCO dataset. The framework is also effective in providing spatial context through a semantic segmentation branch, which enhances the performance of both bounding box and mask predictions. The paper also presents an extensive study of various components and designs, demonstrating the effectiveness of HTC in instance segmentation.