13 Feb 2024 | Haoxuan Wang, Yuzhang Shang, Zhihang Yuan, Junyi Wu, Yan Yan
QuEST is a novel method for low-bit quantization of diffusion models through efficient selective fine-tuning. Diffusion models, though effective for image generation, face challenges in deployment due to high memory and computational costs. Quantization is a promising solution for compression and acceleration, but existing methods fail under low-bit settings. QuEST addresses three key issues in quantized diffusion models: imbalanced activation distributions, imprecise temporal information, and vulnerability to perturbations in specific modules. To mitigate these, QuEST selectively fine-tunes critical layers, such as those preserving temporal information and those sensitive to bit-width reduction. This approach modifies activation distributions and enhances quantization accuracy. QuEST is evaluated on three high-resolution image generation tasks and achieves state-of-the-art performance under various bit-width settings, including generating readable images on full 4-bit Stable Diffusion. The method is data-free and efficient, requiring only supervision from the full-precision model. QuEST introduces an efficient time-aware activation quantizer to adapt to varying activation distributions across time steps. The method is theoretically justified and empirically validated, showing significant improvements in performance and efficiency. QuEST demonstrates that selective fine-tuning can enhance model robustness and reduce quantization errors, making low-bit diffusion models more practical for real-world applications.QuEST is a novel method for low-bit quantization of diffusion models through efficient selective fine-tuning. Diffusion models, though effective for image generation, face challenges in deployment due to high memory and computational costs. Quantization is a promising solution for compression and acceleration, but existing methods fail under low-bit settings. QuEST addresses three key issues in quantized diffusion models: imbalanced activation distributions, imprecise temporal information, and vulnerability to perturbations in specific modules. To mitigate these, QuEST selectively fine-tunes critical layers, such as those preserving temporal information and those sensitive to bit-width reduction. This approach modifies activation distributions and enhances quantization accuracy. QuEST is evaluated on three high-resolution image generation tasks and achieves state-of-the-art performance under various bit-width settings, including generating readable images on full 4-bit Stable Diffusion. The method is data-free and efficient, requiring only supervision from the full-precision model. QuEST introduces an efficient time-aware activation quantizer to adapt to varying activation distributions across time steps. The method is theoretically justified and empirically validated, showing significant improvements in performance and efficiency. QuEST demonstrates that selective fine-tuning can enhance model robustness and reduce quantization errors, making low-bit diffusion models more practical for real-world applications.