Understanding BitsFusion%3A 1.99 bits Weight Quantization of Diffusion Model

**BitsFusion: 1.99 bits Weight Quantization of Diffusion Model** **Abstract:** Diffusion-based image generation models have achieved significant success in synthesizing high-quality content, but their large parameter counts lead to substantial storage and transfer challenges, especially on resource-constrained devices. This work introduces BitsFusion, a novel weight quantization method that reduces the UNet of Stable Diffusion v1.5 to 1.99 bits, achieving a 7.9× smaller model size while maintaining or improving generation quality. The method includes techniques such as assigning optimal bits to each layer, initializing the quantized model, and improving the training strategy to reduce quantization error. Extensive evaluations on various benchmark datasets and human evaluations demonstrate the superior performance of the quantized model. **Introduction:** Diffusion models have shown remarkable results in generating high-fidelity images, but their large parameter counts pose significant challenges for storage and transfer. Existing methods often focus on small-scale models or 4-bit quantization, lacking extensive evaluation on large-scale models like Stable Diffusion v1.5. This work addresses these challenges by proposing BitsFusion, which employs mixed-precision quantization, initialization techniques, and an advanced training pipeline to achieve 1.99-bit quantization. **Key Contributions:** 1. **Mixed-Precision Quantization:** Assigns optimal bits to different layers based on quantization error and parameter efficiency. 2. **Initialization Techniques:** Includes time embedding pre-computing, caching, adding a balance integer, and alternating optimization for scaling factor initialization. 3. **Improved Training Pipeline:** Uses a two-stage training approach with distillation and noise prediction to enhance model performance. **Evaluation:** Extensive quantitative and qualitative evaluations on datasets like TIFA, GenEval, CLIP score, and FID, as well as human evaluations, demonstrate that the 1.99-bit quantized model consistently outperforms the full-precision model in terms of generation quality and efficiency. **Conclusion:** BitsFusion effectively compresses the UNet of Stable Diffusion v1.5 to 1.99 bits, achieving a 7.9× smaller model size while maintaining or improving generation quality. The method's effectiveness is validated through comprehensive evaluations, highlighting its potential for resource-constrained applications.**BitsFusion: 1.99 bits Weight Quantization of Diffusion Model** **Abstract:** Diffusion-based image generation models have achieved significant success in synthesizing high-quality content, but their large parameter counts lead to substantial storage and transfer challenges, especially on resource-constrained devices. This work introduces BitsFusion, a novel weight quantization method that reduces the UNet of Stable Diffusion v1.5 to 1.99 bits, achieving a 7.9× smaller model size while maintaining or improving generation quality. The method includes techniques such as assigning optimal bits to each layer, initializing the quantized model, and improving the training strategy to reduce quantization error. Extensive evaluations on various benchmark datasets and human evaluations demonstrate the superior performance of the quantized model. **Introduction:** Diffusion models have shown remarkable results in generating high-fidelity images, but their large parameter counts pose significant challenges for storage and transfer. Existing methods often focus on small-scale models or 4-bit quantization, lacking extensive evaluation on large-scale models like Stable Diffusion v1.5. This work addresses these challenges by proposing BitsFusion, which employs mixed-precision quantization, initialization techniques, and an advanced training pipeline to achieve 1.99-bit quantization. **Key Contributions:** 1. **Mixed-Precision Quantization:** Assigns optimal bits to different layers based on quantization error and parameter efficiency. 2. **Initialization Techniques:** Includes time embedding pre-computing, caching, adding a balance integer, and alternating optimization for scaling factor initialization. 3. **Improved Training Pipeline:** Uses a two-stage training approach with distillation and noise prediction to enhance model performance. **Evaluation:** Extensive quantitative and qualitative evaluations on datasets like TIFA, GenEval, CLIP score, and FID, as well as human evaluations, demonstrate that the 1.99-bit quantized model consistently outperforms the full-precision model in terms of generation quality and efficiency. **Conclusion:** BitsFusion effectively compresses the UNet of Stable Diffusion v1.5 to 1.99 bits, achieving a 7.9× smaller model size while maintaining or improving generation quality. The method's effectiveness is validated through comprehensive evaluations, highlighting its potential for resource-constrained applications.

BitsFusion: 1.99 bits Weight Quantization of Diffusion Model

6 Jun 2024 | Yang Sui, Yanyu Li, Anil Kag, Yerlan Idelbayev, Junli Cao, Ju Hu, Dhritiman Sagar, Bo Yuan, Sergey Tulyakov, Jian Ren