[slides and audio] Segmenter%3A Transformer for Semantic Segmentation

The paper introduces Segmenter, a transformer-based model for semantic image segmentation. Unlike convolutional methods, Segmenter captures global context from the first layer and throughout the network. It leverages pre-trained Vision Transformers (ViT) and uses either a linear decoder or a mask transformer decoder to generate class labels from patch embeddings. The model achieves excellent results on the ADE20K and Pascal Context datasets, outperforming state-of-the-art convolutional approaches by a significant margin. An extensive ablation study shows that large models and small patch sizes improve performance. Segmenter is also competitive on the Cityscapes dataset. The paper contributes a novel transformer-based approach for semantic segmentation, demonstrating its effectiveness and flexibility in handling various tasks.The paper introduces Segmenter, a transformer-based model for semantic image segmentation. Unlike convolutional methods, Segmenter captures global context from the first layer and throughout the network. It leverages pre-trained Vision Transformers (ViT) and uses either a linear decoder or a mask transformer decoder to generate class labels from patch embeddings. The model achieves excellent results on the ADE20K and Pascal Context datasets, outperforming state-of-the-art convolutional approaches by a significant margin. An extensive ablation study shows that large models and small patch sizes improve performance. Segmenter is also competitive on the Cityscapes dataset. The paper contributes a novel transformer-based approach for semantic segmentation, demonstrating its effectiveness and flexibility in handling various tasks.

Segmenter: Transformer for Semantic Segmentation

2 Sep 2021 | Robin Strudel, Ricardo Garcia, Ivan Laptev, Cordelia Schmid

Segmenter: Transformer for Semantic Segmentation

2 Sep 2021 | Robin Strudel*, Ricardo Garcia*, Ivan Laptev, Cordelia Schmid

2 Sep 2021 | Robin Strudel, Ricardo Garcia, Ivan Laptev, Cordelia Schmid