PTQ4DiT: Post-training Quantization for Diffusion Transformers

PTQ4DiT: Post-training Quantization for Diffusion Transformers

25 May 2024 | Junyi Wu, Haoxuan Wang, Yuzhang Shang, Mubarak Shah, Yan Yan
The paper "PTQ4DiT: Post-training Quantization for Diffusion Transformers" addresses the computational challenges of Diffusion Transformers (DiTs) in real-time applications by proposing a novel post-training quantization method. DiTs, which use transformer blocks instead of U-Nets, offer superior scalability and flexibility but require significant computational resources for inference. The main challenges in quantizing DiTs include the presence of salient channels with extreme magnitudes and temporal variability in activation distributions. To tackle these issues, the authors introduce Channel-wise Salience Balancing (CSB) and Spearman’s ρ-guided Salience Calibration (SSC). CSB redistributes extreme values between activations and weights to balance salience, while SSC dynamically adjusts salience across timesteps to capture temporal variations. Additionally, an offline re-parameterization strategy is designed to eliminate extra computational costs during inference. Experiments demonstrate that PTQ4DiT successfully quantizes DiTs to 8-bit precision (W8A8) and further enables 4-bit weight precision (W4A8) without compromising generation quality. The method outperforms existing PTQ methods for DiTs, achieving comparable performance to full-precision models at lower bit depths.The paper "PTQ4DiT: Post-training Quantization for Diffusion Transformers" addresses the computational challenges of Diffusion Transformers (DiTs) in real-time applications by proposing a novel post-training quantization method. DiTs, which use transformer blocks instead of U-Nets, offer superior scalability and flexibility but require significant computational resources for inference. The main challenges in quantizing DiTs include the presence of salient channels with extreme magnitudes and temporal variability in activation distributions. To tackle these issues, the authors introduce Channel-wise Salience Balancing (CSB) and Spearman’s ρ-guided Salience Calibration (SSC). CSB redistributes extreme values between activations and weights to balance salience, while SSC dynamically adjusts salience across timesteps to capture temporal variations. Additionally, an offline re-parameterization strategy is designed to eliminate extra computational costs during inference. Experiments demonstrate that PTQ4DiT successfully quantizes DiTs to 8-bit precision (W8A8) and further enables 4-bit weight precision (W4A8) without compromising generation quality. The method outperforms existing PTQ methods for DiTs, achieving comparable performance to full-precision models at lower bit depths.
Reach us at info@study.space