Understanding OmniSat%3A Self-Supervised Modality Fusion for Earth Observation

OmniSat is a novel architecture designed for self-supervised multimodal learning in Earth Observation (EO). It merges diverse EO modalities, such as very high-resolution images and time series, into expressive features without labels by exploiting their alignment. To demonstrate its effectiveness, the authors create two new multimodal datasets by augmenting existing ones with new modalities. OmniSat is evaluated on three downstream tasks—forestry, land cover classification, and crop mapping—and shows state-of-the-art performance in both semi-supervised and fully supervised settings. The code and datasets are available at <https://github.com/gastruc/OmniSat>. Earth Observation (EO), Multi-modality, Self-supervised learning The paper introduces OmniSat, a novel architecture for self-supervised multimodal learning in EO. It addresses the limitation of existing multimodal EO datasets and models, which typically focus on a single data type, by merging multiple views of the same area from different modalities into a single representation. OmniSat leverages the natural alignment of EO data through georeferencing and adapts contrastive and masked auto-encoding techniques to learn rich multimodal representations. The authors also enrich two existing EO benchmarks, TreeSatAI and PASTIS-R, with new modalities to evaluate OmniSat's ability to handle diverse inputs. The results show that OmniSat can leverage diverse modalities to learn rich representations, improving performance in tree species, crop type, and land cover classification tasks. Additionally, the cross-modal pretraining scheme enhances performance even when only one modality is available during inference.OmniSat is a novel architecture designed for self-supervised multimodal learning in Earth Observation (EO). It merges diverse EO modalities, such as very high-resolution images and time series, into expressive features without labels by exploiting their alignment. To demonstrate its effectiveness, the authors create two new multimodal datasets by augmenting existing ones with new modalities. OmniSat is evaluated on three downstream tasks—forestry, land cover classification, and crop mapping—and shows state-of-the-art performance in both semi-supervised and fully supervised settings. The code and datasets are available at <https://github.com/gastruc/OmniSat>. Earth Observation (EO), Multi-modality, Self-supervised learning The paper introduces OmniSat, a novel architecture for self-supervised multimodal learning in EO. It addresses the limitation of existing multimodal EO datasets and models, which typically focus on a single data type, by merging multiple views of the same area from different modalities into a single representation. OmniSat leverages the natural alignment of EO data through georeferencing and adapts contrastive and masked auto-encoding techniques to learn rich multimodal representations. The authors also enrich two existing EO benchmarks, TreeSatAI and PASTIS-R, with new modalities to evaluate OmniSat's ability to handle diverse inputs. The results show that OmniSat can leverage diverse modalities to learn rich representations, improving performance in tree species, crop type, and land cover classification tasks. Additionally, the cross-modal pretraining scheme enhances performance even when only one modality is available during inference.

OmniSat: Self-Supervised Modality Fusion for Earth Observation

17 Jul 2024 | Guillaume Astruc, Nicolas Gonthier, Clement Mallet, Loic Landrieu