Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching

Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching

3 Jun 2024 | Xinyin Ma, Gongfan Fang, Michael Bi, Xinchao Wang
Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching This paper proposes a novel method called Learning-to-Cache (L2C) to accelerate diffusion transformers by dynamically caching layers. The key idea is to identify and cache layers that can be skipped without significantly affecting performance. By leveraging the structure of transformer layers and the sequential nature of diffusion, the method explores redundant computations between timesteps. A differentiable optimization objective is introduced to efficiently select layers for caching, resulting in a static computation graph for inference. The method outperforms existing samplers and cache-based methods, achieving significant speedups with minimal performance loss. Experiments show that up to 93.68% of layers in U-ViT-H/2 and 47.43% in DiT-XL/2 can be cached with less than 0.01 drop in FID. The method is effective in reducing computational costs while maintaining high-quality image generation. The results demonstrate that L2C significantly improves inference efficiency for diffusion transformers.Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching This paper proposes a novel method called Learning-to-Cache (L2C) to accelerate diffusion transformers by dynamically caching layers. The key idea is to identify and cache layers that can be skipped without significantly affecting performance. By leveraging the structure of transformer layers and the sequential nature of diffusion, the method explores redundant computations between timesteps. A differentiable optimization objective is introduced to efficiently select layers for caching, resulting in a static computation graph for inference. The method outperforms existing samplers and cache-based methods, achieving significant speedups with minimal performance loss. Experiments show that up to 93.68% of layers in U-ViT-H/2 and 47.43% in DiT-XL/2 can be cached with less than 0.01 drop in FID. The method is effective in reducing computational costs while maintaining high-quality image generation. The results demonstrate that L2C significantly improves inference efficiency for diffusion transformers.
Reach us at info@study.space