Understanding Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery

The article presents a new approach, SatMAE++, for pre-training transformers on multi-spectral satellite imagery. The method addresses the limitations of existing approaches that either ignore scale information or are restricted to a single data modality. SatMAE++ leverages multi-scale information through convolution-based upsampling blocks to reconstruct images at higher scales, making it extensible to include more scales. The approach is equally effective for both optical and multi-spectral imagery. Extensive experiments on six datasets demonstrate the effectiveness of SatMAE++, achieving state-of-the-art performance. On the BigEarthNet dataset, SatMAE++ achieves a mean average precision (mAP) gain of 2.5% for multi-label classification. The method outperforms existing approaches, including ScaleMAE, by not relying on complex positional encodings. The framework is designed to handle multi-scale information without being restricted to a single data modality. The results show that multi-scale pre-training improves performance on various downstream tasks, including land cover classification and multi-label classification. The study highlights the importance of multi-scale information in remote sensing tasks and demonstrates the effectiveness of the proposed framework in achieving better performance and faster convergence.The article presents a new approach, SatMAE++, for pre-training transformers on multi-spectral satellite imagery. The method addresses the limitations of existing approaches that either ignore scale information or are restricted to a single data modality. SatMAE++ leverages multi-scale information through convolution-based upsampling blocks to reconstruct images at higher scales, making it extensible to include more scales. The approach is equally effective for both optical and multi-spectral imagery. Extensive experiments on six datasets demonstrate the effectiveness of SatMAE++, achieving state-of-the-art performance. On the BigEarthNet dataset, SatMAE++ achieves a mean average precision (mAP) gain of 2.5% for multi-label classification. The method outperforms existing approaches, including ScaleMAE, by not relying on complex positional encodings. The framework is designed to handle multi-scale information without being restricted to a single data modality. The results show that multi-scale pre-training improves performance on various downstream tasks, including land cover classification and multi-label classification. The study highlights the importance of multi-scale information in remote sensing tasks and demonstrates the effectiveness of the proposed framework in achieving better performance and faster convergence.

Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery

8 Mar 2024 | Mubashir Noman, Muzammal Naseer, Hisham Cholakkal, Rao Muhammad Anwar, Salman Khan, Fahad Shahbaz Khan