26 Mar 2024 | Yurui Qian, Qi Cai, Yingwei Pan, Yehao Li, Ting Yao, Qibin Sun, and Tao Mei
This paper proposes a novel method called Moving Average Sampling in Frequency domain (MASF) to enhance the stability of diffusion models in image generation. MASF leverages the moving average mechanism to ensemble all prior samples during the denoising process. Instead of applying moving average directly to denoised samples at different timesteps, the method first maps the denoised samples to data space and then performs moving average to avoid distribution shift across timesteps. Furthermore, the method decomposes the samples into different frequency components and executes moving average separately on each component. This approach allows for dynamic evolution of different components during the denoising process. The method is named MASF and can be seamlessly integrated into mainstream pre-trained diffusion models and sampling schedules. Extensive experiments on both unconditional and conditional diffusion models demonstrate that MASF leads to superior performances compared to the baselines, with almost negligible additional complexity cost. The method is shown to be effective in stabilizing the denoising process by exploiting the frequency dynamics of diffusion models. The method is also shown to be compatible with existing diffusion networks that only accept the complete sample. The method is designed to be training-free and can be applied to any diffusion model. The method is shown to be effective in improving the performance of diffusion models in image generation. The method is also shown to be effective in reducing the computational overhead of diffusion models. The method is shown to be effective in improving the quality of generated images. The method is shown to be effective in reducing the instability of the denoising process. The method is shown to be effective in improving the stability of the denoising process. The method is shown to be effective in improving the performance of diffusion models in image generation. The method is shown to be effective in reducing the computational overhead of diffusion models. The method is shown to be effective in improving the quality of generated images. The method is shown to be effective in reducing the instability of the denoising process.This paper proposes a novel method called Moving Average Sampling in Frequency domain (MASF) to enhance the stability of diffusion models in image generation. MASF leverages the moving average mechanism to ensemble all prior samples during the denoising process. Instead of applying moving average directly to denoised samples at different timesteps, the method first maps the denoised samples to data space and then performs moving average to avoid distribution shift across timesteps. Furthermore, the method decomposes the samples into different frequency components and executes moving average separately on each component. This approach allows for dynamic evolution of different components during the denoising process. The method is named MASF and can be seamlessly integrated into mainstream pre-trained diffusion models and sampling schedules. Extensive experiments on both unconditional and conditional diffusion models demonstrate that MASF leads to superior performances compared to the baselines, with almost negligible additional complexity cost. The method is shown to be effective in stabilizing the denoising process by exploiting the frequency dynamics of diffusion models. The method is also shown to be compatible with existing diffusion networks that only accept the complete sample. The method is designed to be training-free and can be applied to any diffusion model. The method is shown to be effective in improving the performance of diffusion models in image generation. The method is also shown to be effective in reducing the computational overhead of diffusion models. The method is shown to be effective in improving the quality of generated images. The method is shown to be effective in reducing the instability of the denoising process. The method is shown to be effective in improving the stability of the denoising process. The method is shown to be effective in improving the performance of diffusion models in image generation. The method is shown to be effective in reducing the computational overhead of diffusion models. The method is shown to be effective in improving the quality of generated images. The method is shown to be effective in reducing the instability of the denoising process.