AtomoVideo: High Fidelity Image-to-Video Generation

AtomoVideo: High Fidelity Image-to-Video Generation

5 Mar 2024 | Litong Gong*, Yiran Zhu*, Weijie Li*, Xiaoyang Kang*, Biao Wang, Tiezheng Ge, Bo Zheng
AtomoVideo is a high-fidelity image-to-video generation framework that leverages advanced text-to-image (T2I) models to produce vivid and detailed videos while maintaining high fidelity to the input image. The method achieves this through multi-granularity image injection, enhancing both low-level and high-level semantic details. AtomoVideo also incorporates zero terminal Signal-to-Noise Ratio (SNR) and v-prediction strategies to improve generation stability without relying on noisy priors. The framework can be extended to video frame prediction tasks, enabling long sequence prediction through iterative generation. Additionally, AtomoVideo can be seamlessly integrated with personalized T2I models and controllable generative models, making it versatile for customized and controllable video generation. Quantitative and qualitative evaluations demonstrate that AtomoVideo outperforms existing methods in terms of image consistency, temporal consistency, motion intensity, and video quality. The method is trained using a pre-trained T2I model and only the added temporal and input layers are fine-tuned, ensuring efficient and effective learning.AtomoVideo is a high-fidelity image-to-video generation framework that leverages advanced text-to-image (T2I) models to produce vivid and detailed videos while maintaining high fidelity to the input image. The method achieves this through multi-granularity image injection, enhancing both low-level and high-level semantic details. AtomoVideo also incorporates zero terminal Signal-to-Noise Ratio (SNR) and v-prediction strategies to improve generation stability without relying on noisy priors. The framework can be extended to video frame prediction tasks, enabling long sequence prediction through iterative generation. Additionally, AtomoVideo can be seamlessly integrated with personalized T2I models and controllable generative models, making it versatile for customized and controllable video generation. Quantitative and qualitative evaluations demonstrate that AtomoVideo outperforms existing methods in terms of image consistency, temporal consistency, motion intensity, and video quality. The method is trained using a pre-trained T2I model and only the added temporal and input layers are fine-tuned, ensuring efficient and effective learning.
Reach us at info@study.space
[slides] AtomoVideo%3A High Fidelity Image-to-Video Generation | StudySpace