Understanding Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation

The paper "Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation" by Guosheng Lin, Chunhua Shen, Anton van den Hengel, and Ian Reid explores the improvement of semantic image segmentation through the use of contextual information. Specifically, the authors focus on 'patch-patch' context between image regions and 'patch-background' context. For patch-patch context, they formulate Conditional Random Fields (CRFs) with CNN-based pairwise potential functions to capture semantic correlations between neighboring patches. To avoid repeated expensive CRF inference during backpropagation, they apply piecewise training to the deep structured model. For patch-background context, they use a network design with traditional multi-scale image input and sliding pyramid pooling to effectively capture background information. Experimental results on popular datasets, including NYUDv2, PASCAL VOC 2012, PASCAL-Context, and SIFT-flow, demonstrate state-of-the-art performance, achieving an intersection-over-union score of 78.0 on the challenging PASCAL VOC 2012 dataset. The main contributions of the paper include the formulation of CNN-based pairwise potential functions in CRFs, efficient piecewise training of CRFs, and the exploration of background context using multi-scale networks and sliding pyramid pooling.The paper "Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation" by Guosheng Lin, Chunhua Shen, Anton van den Hengel, and Ian Reid explores the improvement of semantic image segmentation through the use of contextual information. Specifically, the authors focus on 'patch-patch' context between image regions and 'patch-background' context. For patch-patch context, they formulate Conditional Random Fields (CRFs) with CNN-based pairwise potential functions to capture semantic correlations between neighboring patches. To avoid repeated expensive CRF inference during backpropagation, they apply piecewise training to the deep structured model. For patch-background context, they use a network design with traditional multi-scale image input and sliding pyramid pooling to effectively capture background information. Experimental results on popular datasets, including NYUDv2, PASCAL VOC 2012, PASCAL-Context, and SIFT-flow, demonstrate state-of-the-art performance, achieving an intersection-over-union score of 78.0 on the challenging PASCAL VOC 2012 dataset. The main contributions of the paper include the formulation of CNN-based pairwise potential functions in CRFs, efficient piecewise training of CRFs, and the exploration of background context using multi-scale networks and sliding pyramid pooling.

Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation

6 Jun 2016 | Guosheng Lin, Chunhua Shen, Anton van den Hengel, Ian Reid