30 May 2024 | Tianchen Zhao, Xuefei Ning, Tongcheng Fang, Enshu Liu, Guyue Huang, Zinan Lin, Shengen Yan, Guohao Dai, and Yu Wang
MixDQ is a memory-efficient few-step text-to-image diffusion model with metric-decoupled mixed precision quantization. The paper addresses the challenge of quantizing few-step diffusion models, which are more sensitive to quantization than multi-step models. Existing quantization methods fail to preserve both image quality and text alignment in few-step models. MixDQ introduces a mixed-precision quantization method that addresses these issues by identifying highly sensitive layers and using specialized techniques to protect them. The method includes BOS-aware quantization for text embeddings, metric-decoupled sensitivity analysis, and integer-programming-based bit-width allocation. MixDQ achieves W3.66A16 and W4A8 quantization with negligible degradation in both visual quality and text alignment, resulting in a 3-4× reduction in model size and memory costs, along with a 1.5× latency speedup compared to FP16. The method is effective in preserving both image quality and content, and it is applicable to other generative models and tasks. The paper also discusses the challenges of quantizing few-step diffusion models, including the sensitivity to quantization and the degradation of image-text alignment. The experiments show that MixDQ outperforms existing quantization methods in terms of performance and efficiency. The method is evaluated on widely-used few-step diffusion models, including SDXL-turbo and LCM-Lora, and demonstrates significant improvements in both image quality and text alignment. The paper concludes that MixDQ is a promising approach for quantizing few-step diffusion models and has the potential to benefit future research and applications in compression methods for generative models.MixDQ is a memory-efficient few-step text-to-image diffusion model with metric-decoupled mixed precision quantization. The paper addresses the challenge of quantizing few-step diffusion models, which are more sensitive to quantization than multi-step models. Existing quantization methods fail to preserve both image quality and text alignment in few-step models. MixDQ introduces a mixed-precision quantization method that addresses these issues by identifying highly sensitive layers and using specialized techniques to protect them. The method includes BOS-aware quantization for text embeddings, metric-decoupled sensitivity analysis, and integer-programming-based bit-width allocation. MixDQ achieves W3.66A16 and W4A8 quantization with negligible degradation in both visual quality and text alignment, resulting in a 3-4× reduction in model size and memory costs, along with a 1.5× latency speedup compared to FP16. The method is effective in preserving both image quality and content, and it is applicable to other generative models and tasks. The paper also discusses the challenges of quantizing few-step diffusion models, including the sensitivity to quantization and the degradation of image-text alignment. The experiments show that MixDQ outperforms existing quantization methods in terms of performance and efficiency. The method is evaluated on widely-used few-step diffusion models, including SDXL-turbo and LCM-Lora, and demonstrates significant improvements in both image quality and text alignment. The paper concludes that MixDQ is a promising approach for quantizing few-step diffusion models and has the potential to benefit future research and applications in compression methods for generative models.