Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery

Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery

8 Mar 2024 | Mubashir Noman, Muzammal Naseer, Hisham Cholakkal, Rao Muhammad Anwar, Salman Khan, Fahad Shahbaz Khan
This paper proposes SatMAE++, a multi-scale pre-training framework for transformers to improve performance on multi-spectral satellite imagery. Unlike existing methods that either ignore scale information or restrict themselves to a single data modality, SatMAE++ leverages multi-scale information and utilizes convolution-based upsampling blocks to reconstruct images at higher scales, making it extensible to include more scales. The proposed approach is equally effective for both optical and multi-spectral imagery. Extensive experiments on six datasets show that SatMAE++ achieves state-of-the-art performance, with a mean average precision (mAP) gain of 2.5% on the BigEarthNet dataset for multi-label classification. The framework is also effective for land cover classification, achieving an absolute gain of 3.6% over the baseline. SatMAE++ is shown to be more effective than ScaleMAE, which uses GSD-based positional encodings, by achieving a higher score without using such encodings. The framework is also shown to converge faster on multi-scale data. The results demonstrate that multi-scale pre-training improves model performance and enables faster convergence, especially when data has a wide range of scale variations. The approach is applicable to various downstream tasks, including land cover classification and multi-label classification, and is shown to outperform existing state-of-the-art methods. The code and pre-trained models are available at https://github.com/techmn/satmae_pp.This paper proposes SatMAE++, a multi-scale pre-training framework for transformers to improve performance on multi-spectral satellite imagery. Unlike existing methods that either ignore scale information or restrict themselves to a single data modality, SatMAE++ leverages multi-scale information and utilizes convolution-based upsampling blocks to reconstruct images at higher scales, making it extensible to include more scales. The proposed approach is equally effective for both optical and multi-spectral imagery. Extensive experiments on six datasets show that SatMAE++ achieves state-of-the-art performance, with a mean average precision (mAP) gain of 2.5% on the BigEarthNet dataset for multi-label classification. The framework is also effective for land cover classification, achieving an absolute gain of 3.6% over the baseline. SatMAE++ is shown to be more effective than ScaleMAE, which uses GSD-based positional encodings, by achieving a higher score without using such encodings. The framework is also shown to converge faster on multi-scale data. The results demonstrate that multi-scale pre-training improves model performance and enables faster convergence, especially when data has a wide range of scale variations. The approach is applicable to various downstream tasks, including land cover classification and multi-label classification, and is shown to outperform existing state-of-the-art methods. The code and pre-trained models are available at https://github.com/techmn/satmae_pp.
Reach us at info@study.space
Understanding Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery