Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers

Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers

25 Jun 2024 | Lei Chen1 Yuan Meng1 Chen Tang1 Xinzhu Ma2 Jingyan Jiang3 Xin Wang1 Zhi Wang1 Wenwu Zhu1
**Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers** **Authors:** Lei Chen, Yuan Meng, Chen Tang, Xin Zhu Ma, Jingyan Jiang, Xin Wang, Zhi Wang, Wenwu Zhu **Institution:** Tsinghua University, MMLab, CUHK, Shenzhen Technology University **Abstract:** Diffusion models, particularly those based on Diffusion Transformers (DiTs), have significantly improved image synthesis quality and scalability. However, their large computational requirements hinder real-world deployments. Post-training Quantization (PTQ) offers a solution by compressing model sizes and speeding up inference without retraining. Existing PTQ frameworks for DiTs suffer from biased quantization, leading to performance degradation. This paper introduces Q-DiT, which integrates fine-grained quantization, an automatic search strategy for optimizing quantization granularity, and dynamic activation quantization. Extensive experiments on ImageNet demonstrate that Q-DiT achieves a significant reduction in FID score compared to baselines, maintaining high image generation quality under W8A8 quantization. **Key Contributions:** - Q-DiT: A novel post-training quantization scheme for DiTs, achieving accurate and efficient quantization. - Fine-grained group quantization: Manages significant variance in weights and activations across input channels. - Dynamic activation quantization: Adapts to activation changes across timesteps. - Evolutionary search: Optimizes group sizes for quantization, enhancing efficiency and quality. **Experiments:** - **Settings:** Evaluation on ImageNet 256x256 and 512x512 datasets, using DiT-XL/2 models. - **Metrics:** FID, sFID, IS, and Precision. - **Results:** Q-DiT outperforms baselines in both high and low bit-width settings, maintaining high image quality with minimal performance degradation. **Ablation Studies:** - **Group Size Configuration:** Visualizes optimal group sizes for different models and resolutions. - **Effectiveness of Components:** Demonstrates the impact of each component on quantization performance. **Conclusion:** Q-DiT effectively addresses the challenges of quantizing DiTs, achieving near-lossless compression and high-quality image generation. Future work will focus on extending the approach to other domains and improving computational efficiency.**Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers** **Authors:** Lei Chen, Yuan Meng, Chen Tang, Xin Zhu Ma, Jingyan Jiang, Xin Wang, Zhi Wang, Wenwu Zhu **Institution:** Tsinghua University, MMLab, CUHK, Shenzhen Technology University **Abstract:** Diffusion models, particularly those based on Diffusion Transformers (DiTs), have significantly improved image synthesis quality and scalability. However, their large computational requirements hinder real-world deployments. Post-training Quantization (PTQ) offers a solution by compressing model sizes and speeding up inference without retraining. Existing PTQ frameworks for DiTs suffer from biased quantization, leading to performance degradation. This paper introduces Q-DiT, which integrates fine-grained quantization, an automatic search strategy for optimizing quantization granularity, and dynamic activation quantization. Extensive experiments on ImageNet demonstrate that Q-DiT achieves a significant reduction in FID score compared to baselines, maintaining high image generation quality under W8A8 quantization. **Key Contributions:** - Q-DiT: A novel post-training quantization scheme for DiTs, achieving accurate and efficient quantization. - Fine-grained group quantization: Manages significant variance in weights and activations across input channels. - Dynamic activation quantization: Adapts to activation changes across timesteps. - Evolutionary search: Optimizes group sizes for quantization, enhancing efficiency and quality. **Experiments:** - **Settings:** Evaluation on ImageNet 256x256 and 512x512 datasets, using DiT-XL/2 models. - **Metrics:** FID, sFID, IS, and Precision. - **Results:** Q-DiT outperforms baselines in both high and low bit-width settings, maintaining high image quality with minimal performance degradation. **Ablation Studies:** - **Group Size Configuration:** Visualizes optimal group sizes for different models and resolutions. - **Effectiveness of Components:** Demonstrates the impact of each component on quantization performance. **Conclusion:** Q-DiT effectively addresses the challenges of quantizing DiTs, achieving near-lossless compression and high-quality image generation. Future work will focus on extending the approach to other domains and improving computational efficiency.
Reach us at info@study.space
[slides and audio] Q-DiT%3A Accurate Post-Training Quantization for Diffusion Transformers