AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks

AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks

(11/2024) | Max Ku, Cong Wei, Weiming Ren, Harry Yang, Wenhua Chen
AnyV2V is a novel, tuning-free framework designed to simplify video editing into two primary steps: (1) modifying the first frame using an off-the-shelf image editing model, and (2) generating the edited video through temporal feature injection using an existing image-to-video (I2V) generation model. This approach leverages any existing image editing tools to support a wide range of video editing tasks, including prompt-based editing, reference-based style transfer, subject-driven editing, and identity manipulation. AnyV2V can handle videos of any length and achieves high-quality edits with minimal fine-tuning. Evaluations show that AnyV2V outperforms existing baselines in both quantitative metrics and human evaluations, demonstrating superior visual consistency and quality. The framework's effectiveness is attributed to its ability to propagate the edited first frame across the entire video while maintaining alignment with the source video, thanks to the use of DDIM inversion and feature injection techniques.AnyV2V is a novel, tuning-free framework designed to simplify video editing into two primary steps: (1) modifying the first frame using an off-the-shelf image editing model, and (2) generating the edited video through temporal feature injection using an existing image-to-video (I2V) generation model. This approach leverages any existing image editing tools to support a wide range of video editing tasks, including prompt-based editing, reference-based style transfer, subject-driven editing, and identity manipulation. AnyV2V can handle videos of any length and achieves high-quality edits with minimal fine-tuning. Evaluations show that AnyV2V outperforms existing baselines in both quantitative metrics and human evaluations, demonstrating superior visual consistency and quality. The framework's effectiveness is attributed to its ability to propagate the edited first frame across the entire video while maintaining alignment with the source video, thanks to the use of DDIM inversion and feature injection techniques.
Reach us at info@study.space
[slides and audio] AnyV2V%3A A Tuning-Free Framework For Any Video-to-Video Editing Tasks