Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models

Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models

8 May 2024 | Hongjie Wang¹*, Difan Liu², Yan Kang², Yijun Li², Zhe Lin², Niraj K. Jha¹, Yuchen Liu²†
This paper introduces AT-EDM, a training-free framework for accelerating diffusion models (DMs) without retraining. The framework leverages attention maps to perform run-time pruning of redundant tokens, significantly improving efficiency while maintaining high-quality image generation. AT-EDM consists of two main components: a single-denoising-step token pruning algorithm and a Denoising-Steps-Aware Pruning (DSAP) schedule. The single-denoising-step token pruning algorithm uses a novel graph-based algorithm called Generalized Weighted Page Rank (G-WPR) to identify and prune redundant tokens based on their importance scores derived from attention maps. This is followed by a similarity-based recovery method to restore pruned tokens for convolution operations. The DSAP schedule adjusts the pruning budget across different denoising steps to enhance generation quality. By dynamically adjusting the pruning ratio, DSAP ensures that early denoising steps prune fewer tokens, while later steps prune more aggressively, leading to better overall image quality. Extensive evaluations show that AT-EDM achieves a 38.8% reduction in FLOPs and up to 1.53× speed-up over Stable Diffusion XL (SDXL) while maintaining nearly the same FID and CLIP scores as the full model. Visual examples demonstrate that AT-EDM generates clearer objects with sharper details and better text-image alignment compared to ToMe, a prior training-free method. The framework is also compatible with existing efficient DMs and can be used to improve their image quality. The paper also explores the effectiveness of different pruning strategies and shows that the DSAP schedule not only improves AT-EDM but also benefits other run-time pruning methods like ToMe. The results indicate that AT-EDM outperforms state-of-the-art methods in terms of image quality and text-image alignment, making it a promising approach for efficient diffusion model inference.This paper introduces AT-EDM, a training-free framework for accelerating diffusion models (DMs) without retraining. The framework leverages attention maps to perform run-time pruning of redundant tokens, significantly improving efficiency while maintaining high-quality image generation. AT-EDM consists of two main components: a single-denoising-step token pruning algorithm and a Denoising-Steps-Aware Pruning (DSAP) schedule. The single-denoising-step token pruning algorithm uses a novel graph-based algorithm called Generalized Weighted Page Rank (G-WPR) to identify and prune redundant tokens based on their importance scores derived from attention maps. This is followed by a similarity-based recovery method to restore pruned tokens for convolution operations. The DSAP schedule adjusts the pruning budget across different denoising steps to enhance generation quality. By dynamically adjusting the pruning ratio, DSAP ensures that early denoising steps prune fewer tokens, while later steps prune more aggressively, leading to better overall image quality. Extensive evaluations show that AT-EDM achieves a 38.8% reduction in FLOPs and up to 1.53× speed-up over Stable Diffusion XL (SDXL) while maintaining nearly the same FID and CLIP scores as the full model. Visual examples demonstrate that AT-EDM generates clearer objects with sharper details and better text-image alignment compared to ToMe, a prior training-free method. The framework is also compatible with existing efficient DMs and can be used to improve their image quality. The paper also explores the effectiveness of different pruning strategies and shows that the DSAP schedule not only improves AT-EDM but also benefits other run-time pruning methods like ToMe. The results indicate that AT-EDM outperforms state-of-the-art methods in terms of image quality and text-image alignment, making it a promising approach for efficient diffusion model inference.
Reach us at info@study.space