Understanding MMEarth%3A Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning

The paper introduces *MMEarth*, a global multi-modal pretraining dataset for geospatial representation learning, consisting of 1.2 million locations with 12 modalities. It proposes a *Multi-Pretex Masked Autoencoder (MP-MAE)* approach to learn general-purpose representations for optical satellite images using the ConvNeXt V2 architecture. The MP-MAE approach incorporates multiple pretext tasks for both pixel-level and image-level modalities. The study demonstrates that pretraining with multi-modal pretext tasks outperforms pretraining on ImageNet and domain-specific satellite images in terms of both fine-tuning and linear probing performance, particularly in few-shot settings. The results show improved label efficiency and parameter efficiency, making the approach suitable for real-world applications with limited training data. The paper also discusses the limitations and future directions for further research.The paper introduces *MMEarth*, a global multi-modal pretraining dataset for geospatial representation learning, consisting of 1.2 million locations with 12 modalities. It proposes a *Multi-Pretex Masked Autoencoder (MP-MAE)* approach to learn general-purpose representations for optical satellite images using the ConvNeXt V2 architecture. The MP-MAE approach incorporates multiple pretext tasks for both pixel-level and image-level modalities. The study demonstrates that pretraining with multi-modal pretext tasks outperforms pretraining on ImageNet and domain-specific satellite images in terms of both fine-tuning and linear probing performance, particularly in few-shot settings. The results show improved label efficiency and parameter efficiency, making the approach suitable for real-world applications with limited training data. The paper also discusses the limitations and future directions for further research.

MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning

29 Jul 2024 | Vishal Nedungadi, Ankit Kariyaa, Stefan Oehmcke, Serge Belongie, Christian Igel, and Nico Lang