LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation

LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation

14 Jun 2017 | Abhishek Chaurasia, Eugenio Culurciello
LinkNet is a novel deep neural network architecture designed for efficient semantic segmentation. It achieves state-of-the-art performance on the CamVid and Cityscapes datasets while using only 11.5 million parameters and 21.2 GFLOPs for processing images of size 3×640×360. The network uses a encoder-decoder structure, where the encoder captures high-level features and the decoder reconstructs spatial information. Unlike traditional methods that rely on pooling indices or deconvolution, LinkNet bypasses spatial information directly from the encoder to the corresponding decoder, improving accuracy and reducing processing time. This approach preserves spatial information that would otherwise be lost during encoding, allowing the decoder to use fewer parameters and operations. The network is tested on NVIDIA Jetson TX1 embedded systems and Titan X GPU, demonstrating efficient performance on both high-end GPUs and embedded devices. LinkNet outperforms existing models in terms of inference speed and accuracy on both Cityscapes and CamVid datasets. It is also efficient on high-resolution images and can run in real-time on embedded systems. The architecture is lightweight, with a significantly lower number of operations compared to other state-of-the-art models, making it suitable for real-time applications. The paper also compares LinkNet with other models such as SegNet, ENet, Dilation8/10, and Deep-Lab CRF, showing its superior performance in both IoU and iIoU metrics. The results demonstrate that LinkNet is an efficient and effective solution for semantic segmentation, particularly on embedded platforms.LinkNet is a novel deep neural network architecture designed for efficient semantic segmentation. It achieves state-of-the-art performance on the CamVid and Cityscapes datasets while using only 11.5 million parameters and 21.2 GFLOPs for processing images of size 3×640×360. The network uses a encoder-decoder structure, where the encoder captures high-level features and the decoder reconstructs spatial information. Unlike traditional methods that rely on pooling indices or deconvolution, LinkNet bypasses spatial information directly from the encoder to the corresponding decoder, improving accuracy and reducing processing time. This approach preserves spatial information that would otherwise be lost during encoding, allowing the decoder to use fewer parameters and operations. The network is tested on NVIDIA Jetson TX1 embedded systems and Titan X GPU, demonstrating efficient performance on both high-end GPUs and embedded devices. LinkNet outperforms existing models in terms of inference speed and accuracy on both Cityscapes and CamVid datasets. It is also efficient on high-resolution images and can run in real-time on embedded systems. The architecture is lightweight, with a significantly lower number of operations compared to other state-of-the-art models, making it suitable for real-time applications. The paper also compares LinkNet with other models such as SegNet, ENet, Dilation8/10, and Deep-Lab CRF, showing its superior performance in both IoU and iIoU metrics. The results demonstrate that LinkNet is an efficient and effective solution for semantic segmentation, particularly on embedded platforms.
Reach us at info@study.space