[slides] Learning-to-Cache%3A Accelerating Diffusion Transformer via Layer Caching

Diffusion Transformers have demonstrated remarkable generative capabilities, but their slow inference speed is a significant drawback. This paper introduces Learning-to-Cache (L2C), a novel method that accelerates diffusion transformers by dynamically caching layers without updating model parameters. The authors observe that a large proportion of layers in diffusion transformers can be removed without significantly degrading image quality. L2C leverages the identical structure of transformer layers and the sequential nature of diffusion to explore redundant computations between timesteps. By treating each layer as a fundamental unit for caching, L2C proposes a differentiable optimization objective to identify layers to cache and remove. The method is evaluated on two transformer architectures, DiT and U-ViT, showing that up to 93.68% of layers can be cached in the cache step for U-ViT-H/2, with minimal performance loss. L2C outperforms samplers like DDIM and DPM-Solver, as well as previous cache-based methods, at the same inference speed. The code for L2C is available at https://github.com/horseee/learning-to-cache.Diffusion Transformers have demonstrated remarkable generative capabilities, but their slow inference speed is a significant drawback. This paper introduces Learning-to-Cache (L2C), a novel method that accelerates diffusion transformers by dynamically caching layers without updating model parameters. The authors observe that a large proportion of layers in diffusion transformers can be removed without significantly degrading image quality. L2C leverages the identical structure of transformer layers and the sequential nature of diffusion to explore redundant computations between timesteps. By treating each layer as a fundamental unit for caching, L2C proposes a differentiable optimization objective to identify layers to cache and remove. The method is evaluated on two transformer architectures, DiT and U-ViT, showing that up to 93.68% of layers can be cached in the cache step for U-ViT-H/2, with minimal performance loss. L2C outperforms samplers like DDIM and DPM-Solver, as well as previous cache-based methods, at the same inference speed. The code for L2C is available at https://github.com/horseee/learning-to-cache.

Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching

3 Jun 2024 | Xinyin Ma1 Gongfan Fang1 Michael Bi Mi2 Xinchao Wang1*