Faster Diffusion via Temporal Attention Decomposition

Faster Diffusion via Temporal Attention Decomposition

17 Jul 2024 | Haozhe Liu, Wentian Zhang, Jinheng Xie, Francesco Faccio, Mengmeng Xu, Tao Xiang, Mike Zheng Shou, Juan-Manuel Perez-Rua, Jürgen Schmidhuber
This paper presents a method called Temporally Gating Attention (TGATE) to accelerate text-conditional diffusion models by leveraging the temporal behavior of attention mechanisms during inference. The key insight is that cross-attention becomes less important over time, while self-attention becomes more critical in later stages. By caching and reusing attention outputs at specific time steps, TGATE significantly reduces computational costs without compromising image generation quality. The method is training-free and applicable to various diffusion models, including text-to-image and text-to-video models. Experimental results show that TGATE can accelerate diffusion models by 10%–50%. The method is particularly effective in reducing the computational load of cross-attention in the fidelity-improving phase and self-attention in the semantics-planning phase. TGATE has been tested on several state-of-the-art diffusion models, including SD-series, PixArt, and OpenSora, and has demonstrated significant improvements in efficiency and performance. The results show that TGATE can reduce the number of multiply-accumulate (MAC) operations and latency, making diffusion models more efficient while maintaining high-quality image generation. The method is also compatible with different noise schedulers and acceleration techniques, and it can be applied to both U-Net-based and transformer-based architectures. The study highlights the importance of understanding the temporal dynamics of attention mechanisms in diffusion models and provides a practical solution for improving their efficiency.This paper presents a method called Temporally Gating Attention (TGATE) to accelerate text-conditional diffusion models by leveraging the temporal behavior of attention mechanisms during inference. The key insight is that cross-attention becomes less important over time, while self-attention becomes more critical in later stages. By caching and reusing attention outputs at specific time steps, TGATE significantly reduces computational costs without compromising image generation quality. The method is training-free and applicable to various diffusion models, including text-to-image and text-to-video models. Experimental results show that TGATE can accelerate diffusion models by 10%–50%. The method is particularly effective in reducing the computational load of cross-attention in the fidelity-improving phase and self-attention in the semantics-planning phase. TGATE has been tested on several state-of-the-art diffusion models, including SD-series, PixArt, and OpenSora, and has demonstrated significant improvements in efficiency and performance. The results show that TGATE can reduce the number of multiply-accumulate (MAC) operations and latency, making diffusion models more efficient while maintaining high-quality image generation. The method is also compatible with different noise schedulers and acceleration techniques, and it can be applied to both U-Net-based and transformer-based architectures. The study highlights the importance of understanding the temporal dynamics of attention mechanisms in diffusion models and provides a practical solution for improving their efficiency.
Reach us at info@study.space