[slides] QuEST%3A Low-bit Diffusion Model Quantization via Efficient Selective Finetuning

Diffusion models have achieved significant success in image generation, but their practical deployment is hindered by high memory and time consumption. Quantization offers a solution to compress and accelerate these models, but existing methods fail when applied to low-bit quantization. This paper identifies three key properties that compromise the efficacy of current methods: imbalanced activation distributions, imprecise temporal information, and vulnerability to perturbations of specific modules. To address these issues, the authors propose QuEST (Quantization via Efficient Selective FineTuning), a novel approach that fine-tunes the quantized model to better adapt to the activation distribution. QuEST identifies two critical types of quantized layers—those holding vital temporal information and those sensitive to reduced bit-width—and fine-tunes them to mitigate performance degradation efficiently. Empirical results show that QuEST modifies the activation distribution and provides meaningful temporal information, facilitating easier and more accurate quantization. The method is evaluated on three high-resolution image generation tasks and achieves state-of-the-art performance under various bit-width settings, including generating readable images on full 4-bit Stable Diffusion. The code for QuEST is publicly available.Diffusion models have achieved significant success in image generation, but their practical deployment is hindered by high memory and time consumption. Quantization offers a solution to compress and accelerate these models, but existing methods fail when applied to low-bit quantization. This paper identifies three key properties that compromise the efficacy of current methods: imbalanced activation distributions, imprecise temporal information, and vulnerability to perturbations of specific modules. To address these issues, the authors propose QuEST (Quantization via Efficient Selective FineTuning), a novel approach that fine-tunes the quantized model to better adapt to the activation distribution. QuEST identifies two critical types of quantized layers—those holding vital temporal information and those sensitive to reduced bit-width—and fine-tunes them to mitigate performance degradation efficiently. Empirical results show that QuEST modifies the activation distribution and provides meaningful temporal information, facilitating easier and more accurate quantization. The method is evaluated on three high-resolution image generation tasks and achieves state-of-the-art performance under various bit-width settings, including generating readable images on full 4-bit Stable Diffusion. The code for QuEST is publicly available.

QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning

13 Feb 2024 | Haoxuan Wang, Yuzhang Shang, Zhihang Yuan, Junyi Wu, Yan Yan