26 Jun 2024 | Zhuo Zheng, Stefano Ermon, Dongjun Kim, Liangpei Zhang, Yanfei Zhong
The paper introduces Changen2, a scalable multi-temporal change data generator based on generative models. Changen2 aims to address the challenges of collecting, preprocessing, and annotating large-scale multi-temporal remote sensing images, which are essential for training deep vision models for change detection. The main idea is to simulate a stochastic change process over time using a probabilistic graphical model called the Generative Probabilistic Change Model (GPCM). GPCM factorizes the complex simulation problem into two sub-problems: condition-level change event simulation and image-level semantic change synthesis. Changen2, implemented with a resolution-scalable diffusion transformer, can generate time series of remote sensing images and corresponding semantic and change labels from single-temporal images. It can be trained at scale using both labeled and unlabeled data, making it a "generative change foundation model." The model is capable of producing change supervisory signals from unlabeled single-temporal images and exhibits superior zero-shot performance and transferability across multiple change tasks. Comprehensive experiments demonstrate that Changen2 can generate high-quality, diverse multi-temporal images and dense labels, and pre-trained models on synthetic datasets show superior performance in zero-shot change detection and transferability to real-world datasets.The paper introduces Changen2, a scalable multi-temporal change data generator based on generative models. Changen2 aims to address the challenges of collecting, preprocessing, and annotating large-scale multi-temporal remote sensing images, which are essential for training deep vision models for change detection. The main idea is to simulate a stochastic change process over time using a probabilistic graphical model called the Generative Probabilistic Change Model (GPCM). GPCM factorizes the complex simulation problem into two sub-problems: condition-level change event simulation and image-level semantic change synthesis. Changen2, implemented with a resolution-scalable diffusion transformer, can generate time series of remote sensing images and corresponding semantic and change labels from single-temporal images. It can be trained at scale using both labeled and unlabeled data, making it a "generative change foundation model." The model is capable of producing change supervisory signals from unlabeled single-temporal images and exhibits superior zero-shot performance and transferability across multiple change tasks. Comprehensive experiments demonstrate that Changen2 can generate high-quality, diverse multi-temporal images and dense labels, and pre-trained models on synthetic datasets show superior performance in zero-shot change detection and transferability to real-world datasets.