1 Feb 2024 | Fu-Yun Wang, Zhaoyang Huang, Xiaoyu Shi, Weikang Bian, Guanglu Song, Yu Liu, Hongsheng Li
AnimateLCM accelerates high-fidelity video generation with minimal steps by decoupling consistency learning for image and motion generation priors. It improves training efficiency and visual quality by adapting pre-trained image diffusion models and integrating existing adapters. The method enables fast, high-quality video generation through a decoupled consistency learning strategy and teacher-free adaptation, allowing for efficient training and inference. AnimateLCM is validated on image-conditioned and layout-conditioned video generation, achieving top performance. It also supports integration with community adapters for controllable video generation. The method is efficient, compatible with personalized diffusion models, and reduces computational costs. Experiments show that AnimateLCM outperforms baselines in FVD and CLIPSIM metrics, especially with low-step generation. Limitations include potential blurry outputs with one-step generation. The approach offers a promising solution for accelerating video diffusion models.AnimateLCM accelerates high-fidelity video generation with minimal steps by decoupling consistency learning for image and motion generation priors. It improves training efficiency and visual quality by adapting pre-trained image diffusion models and integrating existing adapters. The method enables fast, high-quality video generation through a decoupled consistency learning strategy and teacher-free adaptation, allowing for efficient training and inference. AnimateLCM is validated on image-conditioned and layout-conditioned video generation, achieving top performance. It also supports integration with community adapters for controllable video generation. The method is efficient, compatible with personalized diffusion models, and reduces computational costs. Experiments show that AnimateLCM outperforms baselines in FVD and CLIPSIM metrics, especially with low-step generation. Limitations include potential blurry outputs with one-step generation. The approach offers a promising solution for accelerating video diffusion models.