[slides] FORA%3A Fast-Forward Caching in Diffusion Transformer Acceleration

Diffusion transformers (DiT) have become a preferred choice for generating high-quality images and videos due to their scalability, but the increased size of these models leads to higher inference costs, making them less suitable for real-time applications. To address this issue, the authors propose Fast-Forward Caching (FORA), a caching mechanism designed to accelerate DiT models by exploiting the repetitive nature of the diffusion process. FORA stores and reuses intermediate outputs from attention and MLP layers across denoising steps, reducing computational overhead without requiring model retraining. This approach seamlessly integrates with existing transformer-based diffusion models and demonstrates significant improvements in inference speed while maintaining minimal performance degradation. Experiments on the ImageNet dataset show that FORA can achieve several times faster processing with only minor impacts on metrics such as the IS Score and FID. The paper also discusses related work on diffusion models and efficiency enhancements, and provides a detailed experimental setup and evaluation results, highlighting the effectiveness and versatility of FORA.Diffusion transformers (DiT) have become a preferred choice for generating high-quality images and videos due to their scalability, but the increased size of these models leads to higher inference costs, making them less suitable for real-time applications. To address this issue, the authors propose Fast-Forward Caching (FORA), a caching mechanism designed to accelerate DiT models by exploiting the repetitive nature of the diffusion process. FORA stores and reuses intermediate outputs from attention and MLP layers across denoising steps, reducing computational overhead without requiring model retraining. This approach seamlessly integrates with existing transformer-based diffusion models and demonstrates significant improvements in inference speed while maintaining minimal performance degradation. Experiments on the ImageNet dataset show that FORA can achieve several times faster processing with only minor impacts on metrics such as the IS Score and FID. The paper also discusses related work on diffusion models and efficiency enhancements, and provides a detailed experimental setup and evaluation results, highlighting the effectiveness and versatility of FORA.

FORA: Fast-Forward Caching in Diffusion Transformer Acceleration

1 Jul 2024 | Pratheba Selvaraju, Tianyu Ding, Tianyi Chen, Ilya Zharkov, Luming Liang