28 Oct 2021 | Enze Xie1, Wenhai Wang2, Zhiding Yu3, Anima Anandkumar3,4, Jose M. Alvarez3, Ping Luo1
SegFormer is a simple and efficient semantic segmentation framework that integrates Transformers with lightweight multilayer perceptron (MLP) decoders. It features a novel hierarchical Transformer encoder that generates multiscale features without positional encoding, avoiding performance degradation due to resolution differences. The encoder also produces both high-resolution fine features and low-resolution coarse features, unlike traditional models that only generate single-scale features. The decoder aggregates information from different layers, combining local and global attention for powerful representations. SegFormer-B4 achieves 50.3% mIoU on ADE20K with 64M parameters, outperforming previous methods. SegFormer-B5 achieves 84.0% mIoU on Cityscapes and shows strong zero-shot robustness. The framework is efficient, with models ranging from SegFormer-B0 to SegFormer-B5, achieving significant improvements in performance and efficiency. SegFormer outperforms existing methods in terms of parameters, FLOPS, speed, and accuracy on ADE20K and Cityscapes. It also demonstrates robustness to natural corruptions, outperforming other methods in tasks like autonomous driving. The framework is designed to be lightweight and efficient, making it suitable for real-time applications. The code is publicly available.SegFormer is a simple and efficient semantic segmentation framework that integrates Transformers with lightweight multilayer perceptron (MLP) decoders. It features a novel hierarchical Transformer encoder that generates multiscale features without positional encoding, avoiding performance degradation due to resolution differences. The encoder also produces both high-resolution fine features and low-resolution coarse features, unlike traditional models that only generate single-scale features. The decoder aggregates information from different layers, combining local and global attention for powerful representations. SegFormer-B4 achieves 50.3% mIoU on ADE20K with 64M parameters, outperforming previous methods. SegFormer-B5 achieves 84.0% mIoU on Cityscapes and shows strong zero-shot robustness. The framework is efficient, with models ranging from SegFormer-B0 to SegFormer-B5, achieving significant improvements in performance and efficiency. SegFormer outperforms existing methods in terms of parameters, FLOPS, speed, and accuracy on ADE20K and Cityscapes. It also demonstrates robustness to natural corruptions, outperforming other methods in tasks like autonomous driving. The framework is designed to be lightweight and efficient, making it suitable for real-time applications. The code is publicly available.