8 May 2024 | Hongjie Wang1*, Difan Liu2, Yan Kang2, Yijun Li2, Zhe Lin2, Niraj K. Jha1, Yuchen Liu2†
Diffusion Models (DMs) have demonstrated superior performance in generating high-quality and diverse images, but their architectural design is computationally expensive, particularly due to the heavy use of attention modules. Existing methods often require retraining to enhance efficiency, which is computationally costly and not scalable. To address this, the authors propose the Attention-driven Training-free Efficient Diffusion Model (AT-EDM) framework. This framework leverages attention maps to perform run-time pruning of redundant tokens without retraining. Specifically, for single-denoising-step pruning, a novel ranking algorithm, Generalized Weighted Page Rank (G-WPR), is developed to identify redundant tokens, and a similarity-based recovery method is proposed to restore tokens for convolution operations. Additionally, a Denoising-Steps-Aware Pruning (DSAP) approach is introduced to adjust the pruning budget across different denoising timesteps for better generation quality. Extensive evaluations show that AT-EDM performs favorably against prior art in terms of efficiency (e.g., 38.8% FLOPs saving and up to 1.53× speed-up over Stable Diffusion XL) while maintaining nearly the same FID and CLIP scores as the full model. The project webpage is available at <https://atedm.github.io>.Diffusion Models (DMs) have demonstrated superior performance in generating high-quality and diverse images, but their architectural design is computationally expensive, particularly due to the heavy use of attention modules. Existing methods often require retraining to enhance efficiency, which is computationally costly and not scalable. To address this, the authors propose the Attention-driven Training-free Efficient Diffusion Model (AT-EDM) framework. This framework leverages attention maps to perform run-time pruning of redundant tokens without retraining. Specifically, for single-denoising-step pruning, a novel ranking algorithm, Generalized Weighted Page Rank (G-WPR), is developed to identify redundant tokens, and a similarity-based recovery method is proposed to restore tokens for convolution operations. Additionally, a Denoising-Steps-Aware Pruning (DSAP) approach is introduced to adjust the pruning budget across different denoising timesteps for better generation quality. Extensive evaluations show that AT-EDM performs favorably against prior art in terms of efficiency (e.g., 38.8% FLOPs saving and up to 1.53× speed-up over Stable Diffusion XL) while maintaining nearly the same FID and CLIP scores as the full model. The project webpage is available at <https://atedm.github.io>.