SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

10 Oct 2016 | Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla, Senior Member, IEEE,
SegNet is a deep convolutional encoder-decoder architecture for semantic pixel-wise image segmentation. It uses an encoder network topologically identical to the 13 convolutional layers of VGG16, followed by a decoder network that maps low-resolution encoder feature maps to full input resolution for classification. The decoder uses max-pooling indices from the encoder to perform non-linear upsampling, eliminating the need for learning to upsample. This approach reduces parameters and enables end-to-end training. SegNet is efficient in terms of memory and computational time, with a smaller number of trainable parameters than other architectures. It was tested on road and indoor scene segmentation tasks, showing competitive performance with efficient inference. SegNet was compared with FCN, DeepLab-LargeFOV, and DeconvNet, revealing trade-offs between memory and accuracy. SegNet's decoder uses stored max-pooling indices to upsample feature maps, which are then convolved with trainable filters to produce dense feature maps. SegNet-Basic, a simplified version, performs well with lower memory usage. It was evaluated on the CamVid dataset, achieving high accuracy and smooth segmentation. SegNet outperforms other architectures in terms of boundary delineation and semantic contour accuracy. It is efficient in memory and can be trained end-to-end with stochastic gradient descent. SegNet's performance is further validated on road and indoor scene segmentation tasks, showing its effectiveness in practical applications. The architecture is implemented in Caffe and a web demo is available. SegNet's design allows for efficient inference and accurate segmentation, making it suitable for real-time applications.SegNet is a deep convolutional encoder-decoder architecture for semantic pixel-wise image segmentation. It uses an encoder network topologically identical to the 13 convolutional layers of VGG16, followed by a decoder network that maps low-resolution encoder feature maps to full input resolution for classification. The decoder uses max-pooling indices from the encoder to perform non-linear upsampling, eliminating the need for learning to upsample. This approach reduces parameters and enables end-to-end training. SegNet is efficient in terms of memory and computational time, with a smaller number of trainable parameters than other architectures. It was tested on road and indoor scene segmentation tasks, showing competitive performance with efficient inference. SegNet was compared with FCN, DeepLab-LargeFOV, and DeconvNet, revealing trade-offs between memory and accuracy. SegNet's decoder uses stored max-pooling indices to upsample feature maps, which are then convolved with trainable filters to produce dense feature maps. SegNet-Basic, a simplified version, performs well with lower memory usage. It was evaluated on the CamVid dataset, achieving high accuracy and smooth segmentation. SegNet outperforms other architectures in terms of boundary delineation and semantic contour accuracy. It is efficient in memory and can be trained end-to-end with stochastic gradient descent. SegNet's performance is further validated on road and indoor scene segmentation tasks, showing its effectiveness in practical applications. The architecture is implemented in Caffe and a web demo is available. SegNet's design allows for efficient inference and accurate segmentation, making it suitable for real-time applications.
Reach us at info@study.space
[slides and audio] SegNet%3A A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation