25 Jun 2024 | Lei Chen1 Yuan Meng1 Chen Tang1 Xinzhu Ma2 Jingyan Jiang3 Xin Wang1 Zhi Wang1 Wenwu Zhu1
**Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers**
**Authors:** Lei Chen, Yuan Meng, Chen Tang, Xin Zhu Ma, Jingyan Jiang, Xin Wang, Zhi Wang, Wenwu Zhu
**Institution:** Tsinghua University, MMLab, CUHK, Shenzhen Technology University
**Abstract:**
Diffusion models, particularly those based on Diffusion Transformers (DiTs), have significantly improved image synthesis quality and scalability. However, their large computational requirements hinder real-world deployments. Post-training Quantization (PTQ) offers a solution by compressing model sizes and speeding up inference without retraining. Existing PTQ frameworks for DiTs suffer from biased quantization, leading to performance degradation. This paper introduces Q-DiT, which integrates fine-grained quantization, an automatic search strategy for optimizing quantization granularity, and dynamic activation quantization. Extensive experiments on ImageNet demonstrate that Q-DiT achieves a significant reduction in FID score compared to baselines, maintaining high image generation quality under W8A8 quantization.
**Key Contributions:**
- Q-DiT: A novel post-training quantization scheme for DiTs, achieving accurate and efficient quantization.
- Fine-grained group quantization: Manages significant variance in weights and activations across input channels.
- Dynamic activation quantization: Adapts to activation changes across timesteps.
- Evolutionary search: Optimizes group sizes for quantization, enhancing efficiency and quality.
**Experiments:**
- **Settings:** Evaluation on ImageNet 256x256 and 512x512 datasets, using DiT-XL/2 models.
- **Metrics:** FID, sFID, IS, and Precision.
- **Results:** Q-DiT outperforms baselines in both high and low bit-width settings, maintaining high image quality with minimal performance degradation.
**Ablation Studies:**
- **Group Size Configuration:** Visualizes optimal group sizes for different models and resolutions.
- **Effectiveness of Components:** Demonstrates the impact of each component on quantization performance.
**Conclusion:**
Q-DiT effectively addresses the challenges of quantizing DiTs, achieving near-lossless compression and high-quality image generation. Future work will focus on extending the approach to other domains and improving computational efficiency.**Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers**
**Authors:** Lei Chen, Yuan Meng, Chen Tang, Xin Zhu Ma, Jingyan Jiang, Xin Wang, Zhi Wang, Wenwu Zhu
**Institution:** Tsinghua University, MMLab, CUHK, Shenzhen Technology University
**Abstract:**
Diffusion models, particularly those based on Diffusion Transformers (DiTs), have significantly improved image synthesis quality and scalability. However, their large computational requirements hinder real-world deployments. Post-training Quantization (PTQ) offers a solution by compressing model sizes and speeding up inference without retraining. Existing PTQ frameworks for DiTs suffer from biased quantization, leading to performance degradation. This paper introduces Q-DiT, which integrates fine-grained quantization, an automatic search strategy for optimizing quantization granularity, and dynamic activation quantization. Extensive experiments on ImageNet demonstrate that Q-DiT achieves a significant reduction in FID score compared to baselines, maintaining high image generation quality under W8A8 quantization.
**Key Contributions:**
- Q-DiT: A novel post-training quantization scheme for DiTs, achieving accurate and efficient quantization.
- Fine-grained group quantization: Manages significant variance in weights and activations across input channels.
- Dynamic activation quantization: Adapts to activation changes across timesteps.
- Evolutionary search: Optimizes group sizes for quantization, enhancing efficiency and quality.
**Experiments:**
- **Settings:** Evaluation on ImageNet 256x256 and 512x512 datasets, using DiT-XL/2 models.
- **Metrics:** FID, sFID, IS, and Precision.
- **Results:** Q-DiT outperforms baselines in both high and low bit-width settings, maintaining high image quality with minimal performance degradation.
**Ablation Studies:**
- **Group Size Configuration:** Visualizes optimal group sizes for different models and resolutions.
- **Effectiveness of Components:** Demonstrates the impact of each component on quantization performance.
**Conclusion:**
Q-DiT effectively addresses the challenges of quantizing DiTs, achieving near-lossless compression and high-quality image generation. Future work will focus on extending the approach to other domains and improving computational efficiency.