Understanding Enhanced Distribution Alignment for Post-Training Quantization of Diffusion Models

Diffusion models have achieved significant success in image generation tasks, but their heavy denoising process and complex neural networks hinder their low-latency applications. Post-training quantization (PTQ) is a promising method to reduce model complexity and accelerate diffusion models without fine-tuning. However, existing PTQ methods for diffusion models suffer from distribution mismatch issues at both the calibration sample level and the reconstruction output level, leading to suboptimal performance, especially in low-bit cases. To address these issues, the paper proposes Enhanced Distribution Alignment for Post-Training Quantization of Diffusion Models (EDA-DM). At the calibration sample level, EDA-DM uses Temporal Distribution Alignment Calibration (TDAC) to select calibration samples based on the density and variety of feature maps in the latent space, aligning them with the overall sample distribution. At the reconstruction output level, Fine-grained Block Reconstruction (FBR) modifies the loss function by incorporating the losses of layers within blocks, aligning the outputs of the quantized model with the full-precision model. Extensive experiments on various models (DDIM, LDM-4, LDM-8, Stable-Diffusion) and datasets (CIFAR-10, LSUN-Bedroom, LSUN-Church, ImageNet, MS-COCO) demonstrate that EDA-DM significantly outperforms existing PTQ methods, especially in low-bit quantization. The method also shows robustness across different model scales, resolutions, and guidance conditions.Diffusion models have achieved significant success in image generation tasks, but their heavy denoising process and complex neural networks hinder their low-latency applications. Post-training quantization (PTQ) is a promising method to reduce model complexity and accelerate diffusion models without fine-tuning. However, existing PTQ methods for diffusion models suffer from distribution mismatch issues at both the calibration sample level and the reconstruction output level, leading to suboptimal performance, especially in low-bit cases. To address these issues, the paper proposes Enhanced Distribution Alignment for Post-Training Quantization of Diffusion Models (EDA-DM). At the calibration sample level, EDA-DM uses Temporal Distribution Alignment Calibration (TDAC) to select calibration samples based on the density and variety of feature maps in the latent space, aligning them with the overall sample distribution. At the reconstruction output level, Fine-grained Block Reconstruction (FBR) modifies the loss function by incorporating the losses of layers within blocks, aligning the outputs of the quantized model with the full-precision model. Extensive experiments on various models (DDIM, LDM-4, LDM-8, Stable-Diffusion) and datasets (CIFAR-10, LSUN-Bedroom, LSUN-Church, ImageNet, MS-COCO) demonstrate that EDA-DM significantly outperforms existing PTQ methods, especially in low-bit quantization. The method also shows robustness across different model scales, resolutions, and guidance conditions.

EDA-DM: Enhanced Distribution Alignment for Post-Training Quantization of Diffusion Models

26 Sep 2024 | Xuewen Liu, Zhikai Li, Junrui Xiao, Qingyi Gu