Progressive Knowledge Distillation of Stable Diffusion XL using Layer Level Loss

Progressive Knowledge Distillation of Stable Diffusion XL using Layer Level Loss

5 Jan 2024 | Yatharth Gupta*, Vishnu V. Jaddipal*, Harish Prabhala, Sayak Paul, Patrick Von Platen
This paper introduces two scaled-down variants of Stable Diffusion XL (SDXL), Segmind Stable Diffusion (SSD-1B) and Segmind-Vega, with 1.3B and 0.74B parameters respectively. These models are created through progressive removal of layers using layer-level losses, which reduces the model size while preserving generative quality. The methodology involves eliminating residual networks and transformer blocks from the U-Net structure of SDXL, resulting in significant reductions in parameters and latency. The compact models effectively emulate the original SDXL by capitalizing on transferred knowledge, achieving competitive results against larger models. The work demonstrates the efficacy of knowledge distillation combined with layer-level losses in reducing model size while preserving the high-quality generative capabilities of SDXL. This approach allows for more accessible deployment in resource-constrained environments. The study also highlights the importance of choosing the right dataset and teacher model, as they can significantly boost the final model's quality. The results show that SSD-1B achieves up to 60% speedup, while Segmind-Vega achieves up to 100% speedup. Additionally, a human preference study shows that SSD-1B is marginally preferred over SDXL in terms of image quality. The research also explores the potential of applying this technique to other large models such as LLMs and MLMs in the future.This paper introduces two scaled-down variants of Stable Diffusion XL (SDXL), Segmind Stable Diffusion (SSD-1B) and Segmind-Vega, with 1.3B and 0.74B parameters respectively. These models are created through progressive removal of layers using layer-level losses, which reduces the model size while preserving generative quality. The methodology involves eliminating residual networks and transformer blocks from the U-Net structure of SDXL, resulting in significant reductions in parameters and latency. The compact models effectively emulate the original SDXL by capitalizing on transferred knowledge, achieving competitive results against larger models. The work demonstrates the efficacy of knowledge distillation combined with layer-level losses in reducing model size while preserving the high-quality generative capabilities of SDXL. This approach allows for more accessible deployment in resource-constrained environments. The study also highlights the importance of choosing the right dataset and teacher model, as they can significantly boost the final model's quality. The results show that SSD-1B achieves up to 60% speedup, while Segmind-Vega achieves up to 100% speedup. Additionally, a human preference study shows that SSD-1B is marginally preferred over SDXL in terms of image quality. The research also explores the potential of applying this technique to other large models such as LLMs and MLMs in the future.
Reach us at info@study.space