Changen2: Multi-Temporal Remote Sensing Generative Change Foundation Model

Changen2: Multi-Temporal Remote Sensing Generative Change Foundation Model

26 Jun 2024 | Zhuo Zheng, Stefano Ermon, Dongjun Kim, Liangpei Zhang, Yanfei Zhong
Changen2 is a multi-temporal remote sensing generative change foundation model that addresses the challenge of generating large-scale, high-quality change data for remote sensing applications. The model is based on a generative probabilistic change model (GPCM), which factorizes the complex change simulation problem into two tractable sub-problems: condition-level change event simulation and image-level semantic change synthesis. Changen2 is implemented with a resolution-scalable diffusion transformer (RS-DiT), enabling the generation of time series of remote sensing images and corresponding semantic and change labels from single-temporal images. The model is trained using self-supervision, allowing it to generate change supervisory signals from unlabeled single-temporal images. This approach eliminates the need for manual annotations and enables the model to leverage vast amounts of unlabeled Earth observation data. Changen2 is capable of generating change-labeled multi-temporal images from unlabeled single-temporal images, yielding change supervision at scale. The model is referred to as a "generative change foundation model" to distinguish it from other foundation models. Changen2 has been demonstrated to produce superior spatiotemporal scalability in data generation, with a model trained on 256x256 pixel single-temporal images able to generate time series of any length and resolutions of 1,024x1,024 pixels. Changen2 pre-trained models exhibit superior zero-shot performance and transferability across multiple types of change tasks, including ordinary and off-nadir building change, land-use/land-cover change, and disaster assessment. The model has been used to generate three large-scale synthetic change detection datasets, including a building change detection dataset (Changen2-S1-15k), a semantic change detection dataset (Changen2-S9-27k), and a class-agnostic change detection dataset (Changen2-S0-1.2M). The change detector pre-trained on these synthetic datasets has superior transferability on real-world change detection datasets, outperforming state-of-the-art models. Additionally, it exhibits outstanding zero-shot prediction capability, significantly improving over existing models. The model's ability to generate change-labeled multi-temporal images from unlabeled single-temporal images is a key factor in ensuring transferability after model pre-training. The main contributions of this paper include the development of a generative change modeling framework that decouples the complex stochastic change process simulation to more tractable change event simulation and semantic change synthesis, the development of a generative change foundation model, Changen2, employing a novel resolution-scalable diffusion transformer architecture, and the generation of three globally distributed synthetic change datasets. The model has been shown to outperform existing foundation models in terms of zero-shot prediction capability and transferability.Changen2 is a multi-temporal remote sensing generative change foundation model that addresses the challenge of generating large-scale, high-quality change data for remote sensing applications. The model is based on a generative probabilistic change model (GPCM), which factorizes the complex change simulation problem into two tractable sub-problems: condition-level change event simulation and image-level semantic change synthesis. Changen2 is implemented with a resolution-scalable diffusion transformer (RS-DiT), enabling the generation of time series of remote sensing images and corresponding semantic and change labels from single-temporal images. The model is trained using self-supervision, allowing it to generate change supervisory signals from unlabeled single-temporal images. This approach eliminates the need for manual annotations and enables the model to leverage vast amounts of unlabeled Earth observation data. Changen2 is capable of generating change-labeled multi-temporal images from unlabeled single-temporal images, yielding change supervision at scale. The model is referred to as a "generative change foundation model" to distinguish it from other foundation models. Changen2 has been demonstrated to produce superior spatiotemporal scalability in data generation, with a model trained on 256x256 pixel single-temporal images able to generate time series of any length and resolutions of 1,024x1,024 pixels. Changen2 pre-trained models exhibit superior zero-shot performance and transferability across multiple types of change tasks, including ordinary and off-nadir building change, land-use/land-cover change, and disaster assessment. The model has been used to generate three large-scale synthetic change detection datasets, including a building change detection dataset (Changen2-S1-15k), a semantic change detection dataset (Changen2-S9-27k), and a class-agnostic change detection dataset (Changen2-S0-1.2M). The change detector pre-trained on these synthetic datasets has superior transferability on real-world change detection datasets, outperforming state-of-the-art models. Additionally, it exhibits outstanding zero-shot prediction capability, significantly improving over existing models. The model's ability to generate change-labeled multi-temporal images from unlabeled single-temporal images is a key factor in ensuring transferability after model pre-training. The main contributions of this paper include the development of a generative change modeling framework that decouples the complex stochastic change process simulation to more tractable change event simulation and semantic change synthesis, the development of a generative change foundation model, Changen2, employing a novel resolution-scalable diffusion transformer architecture, and the generation of three globally distributed synthetic change datasets. The model has been shown to outperform existing foundation models in terms of zero-shot prediction capability and transferability.
Reach us at info@study.space
[slides] Changen2%3A Multi-Temporal Remote Sensing Generative Change Foundation Model | StudySpace