PTQ4DiT: Post-training Quantization for Diffusion Transformers

PTQ4DiT: Post-training Quantization for Diffusion Transformers

25 May 2024 | Junyi Wu, Haoxuan Wang, Yuzhang Shang, Mubarak Shah, Yan Yan
PTQ4DiT is a post-training quantization method specifically designed for Diffusion Transformers (DiTs), addressing the challenges of quantizing these models. DiTs, which use transformer-based architectures instead of traditional U-Nets, have shown strong performance in image generation but face significant computational and memory demands during inference. PTQ4DiT reduces these demands by quantizing weights and activations to 8-bit (W8A8) and even 4-bit (W4A8) precision while maintaining high-quality image generation. The primary challenges in quantizing DiTs include the presence of salient channels with extreme magnitudes and the temporal variability in activation distributions across different timesteps. To address these, PTQ4DiT introduces two key techniques: Channel-wise Salience Balancing (CSB) and Spearman's ρ-guided Salience Calibration (SSC). CSB redistributes extreme magnitudes between activations and weights to balance salience, while SSC dynamically adjusts salience evaluations across timesteps to account for temporal variations. To minimize computational overhead during inference, PTQ4DiT employs an offline re-parameterization strategy, allowing the model to maintain mathematical equivalence without additional computation. Experiments show that PTQ4DiT achieves comparable performance to full-precision models at W8A8 and significantly outperforms other methods at W4A8, demonstrating its effectiveness in reducing computational costs while preserving generation quality. The method is validated on ImageNet datasets, showing strong results in terms of image quality metrics such as FID and IS.PTQ4DiT is a post-training quantization method specifically designed for Diffusion Transformers (DiTs), addressing the challenges of quantizing these models. DiTs, which use transformer-based architectures instead of traditional U-Nets, have shown strong performance in image generation but face significant computational and memory demands during inference. PTQ4DiT reduces these demands by quantizing weights and activations to 8-bit (W8A8) and even 4-bit (W4A8) precision while maintaining high-quality image generation. The primary challenges in quantizing DiTs include the presence of salient channels with extreme magnitudes and the temporal variability in activation distributions across different timesteps. To address these, PTQ4DiT introduces two key techniques: Channel-wise Salience Balancing (CSB) and Spearman's ρ-guided Salience Calibration (SSC). CSB redistributes extreme magnitudes between activations and weights to balance salience, while SSC dynamically adjusts salience evaluations across timesteps to account for temporal variations. To minimize computational overhead during inference, PTQ4DiT employs an offline re-parameterization strategy, allowing the model to maintain mathematical equivalence without additional computation. Experiments show that PTQ4DiT achieves comparable performance to full-precision models at W8A8 and significantly outperforms other methods at W4A8, demonstrating its effectiveness in reducing computational costs while preserving generation quality. The method is validated on ImageNet datasets, showing strong results in terms of image quality metrics such as FID and IS.
Reach us at info@study.space
Understanding PTQ4DiT%3A Post-training Quantization for Diffusion Transformers