DreamMotion: Space-Time Self-Similar Score Distillation for Zero-Shot Video Editing

DreamMotion: Space-Time Self-Similar Score Distillation for Zero-Shot Video Editing

15 Jul 2024 | Hyeonho Jeong, Jinho Chang, Geon Yeong Park, and Jong Chul Ye
DreamMotion is a novel approach for zero-shot video editing that leverages score distillation sampling to edit videos while preserving their structure and motion. Unlike traditional methods that rely on reverse diffusion processes, DreamMotion uses score distillation to optimize video editing directly from videos that already exhibit natural motion. The method introduces space-time self-similarity regularization to align the structure and motion of the original and edited videos, ensuring smooth and accurate results. DreamMotion is applicable to both cascaded and non-cascaded video diffusion frameworks, demonstrating its model-agnostic nature. The approach involves three key components: appearance injection using Delta Denoising Score (DDS), spatial self-similarity matching to preserve structure, and temporal self-similarity matching to ensure smooth motion. Through extensive experiments, DreamMotion outperforms existing methods in maintaining the original video's structure and motion while accurately reflecting the target text. The method is evaluated on both non-cascaded and cascaded video diffusion frameworks, showing superior performance in terms of text alignment, spatial-temporal consistency, and user satisfaction. However, the framework is limited in its ability to handle videos requiring significant structural changes. The work addresses the challenge of generating temporally consistent, real-world motion in video editing, offering a promising solution for text-driven video generation.DreamMotion is a novel approach for zero-shot video editing that leverages score distillation sampling to edit videos while preserving their structure and motion. Unlike traditional methods that rely on reverse diffusion processes, DreamMotion uses score distillation to optimize video editing directly from videos that already exhibit natural motion. The method introduces space-time self-similarity regularization to align the structure and motion of the original and edited videos, ensuring smooth and accurate results. DreamMotion is applicable to both cascaded and non-cascaded video diffusion frameworks, demonstrating its model-agnostic nature. The approach involves three key components: appearance injection using Delta Denoising Score (DDS), spatial self-similarity matching to preserve structure, and temporal self-similarity matching to ensure smooth motion. Through extensive experiments, DreamMotion outperforms existing methods in maintaining the original video's structure and motion while accurately reflecting the target text. The method is evaluated on both non-cascaded and cascaded video diffusion frameworks, showing superior performance in terms of text alignment, spatial-temporal consistency, and user satisfaction. However, the framework is limited in its ability to handle videos requiring significant structural changes. The work addresses the challenge of generating temporally consistent, real-world motion in video editing, offering a promising solution for text-driven video generation.
Reach us at info@study.space
[slides and audio] DreamMotion%3A Space-Time Self-Similarity Score Distillation for Zero-Shot Video Editing