[slides] Q-DiT%3A Accurate Post-Training Quantization for Diffusion Transformers

Q-DiT is a post-training quantization method designed for diffusion transformers, addressing the challenges of quantizing diffusion models. Diffusion models, particularly those based on transformers (DiTs), have shown significant improvements in image synthesis quality and scalability. However, their large computational demands hinder real-world deployment. Post-training quantization (PTQ) offers a solution by reducing model size and inference speed without retraining. Existing PTQ methods for ViT and conventional diffusion models suffer from biased quantization and performance degradation. Q-DiT introduces three techniques: fine-grained quantization to manage weight and activation variance, an automatic search strategy to optimize quantization granularity, and dynamic activation quantization to adapt to changes in activation across timesteps. Extensive experiments on the ImageNet dataset show that Q-DiT achieves significant improvements in image generation quality and efficiency. When quantizing DiT-XL/2 to W8A8, Q-DiT reduces FID by 1.26 compared to the baseline. Under W4A8 settings, it maintains high fidelity with only a marginal increase in FID, setting a new benchmark for efficient, high-quality quantization in diffusion transformers. Q-DiT also employs an evolutionary search algorithm to optimize group sizes for different model layers, using Fréchet Inception Distance (FID) as a metric to correlate quantization effects with visual quality. This approach ensures high-quality image generation with minimal overhead. Q-DiT addresses the challenges of quantizing DiTs by managing input channel variance in weights and activations, and by dynamically adjusting quantization parameters during runtime. It also optimizes group sizes through an evolutionary search, improving quantization performance and efficiency. The method is effective in reducing quantization error and maintaining high image quality, even under stringent quantization constraints. Q-DiT outperforms existing baselines in both performance and efficiency, demonstrating its effectiveness in image generation tasks. The method is implemented in Python and available at https://github.com/Juanerx/Q-DiT.Q-DiT is a post-training quantization method designed for diffusion transformers, addressing the challenges of quantizing diffusion models. Diffusion models, particularly those based on transformers (DiTs), have shown significant improvements in image synthesis quality and scalability. However, their large computational demands hinder real-world deployment. Post-training quantization (PTQ) offers a solution by reducing model size and inference speed without retraining. Existing PTQ methods for ViT and conventional diffusion models suffer from biased quantization and performance degradation. Q-DiT introduces three techniques: fine-grained quantization to manage weight and activation variance, an automatic search strategy to optimize quantization granularity, and dynamic activation quantization to adapt to changes in activation across timesteps. Extensive experiments on the ImageNet dataset show that Q-DiT achieves significant improvements in image generation quality and efficiency. When quantizing DiT-XL/2 to W8A8, Q-DiT reduces FID by 1.26 compared to the baseline. Under W4A8 settings, it maintains high fidelity with only a marginal increase in FID, setting a new benchmark for efficient, high-quality quantization in diffusion transformers. Q-DiT also employs an evolutionary search algorithm to optimize group sizes for different model layers, using Fréchet Inception Distance (FID) as a metric to correlate quantization effects with visual quality. This approach ensures high-quality image generation with minimal overhead. Q-DiT addresses the challenges of quantizing DiTs by managing input channel variance in weights and activations, and by dynamically adjusting quantization parameters during runtime. It also optimizes group sizes through an evolutionary search, improving quantization performance and efficiency. The method is effective in reducing quantization error and maintaining high image quality, even under stringent quantization constraints. Q-DiT outperforms existing baselines in both performance and efficiency, demonstrating its effectiveness in image generation tasks. The method is implemented in Python and available at https://github.com/Juanerx/Q-DiT.

Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers

25 Jun 2024 | Lei Chen, Yuan Meng, Chen Tang, Xinzhu Ma, Jingyan Jiang, Xin Wang, Zhi Wang, Wenwu Zhu