Motion-Zero: Zero-Shot Moving Object Control Framework for Diffusion-Based Video Generation

Motion-Zero: Zero-Shot Moving Object Control Framework for Diffusion-Based Video Generation

22 Jan 2024 | Changgu Chen, Junwei Shu, Lianggangxu Chen, Gaoqi He, Changbo Wang*, Yang Li*
Motion-Zero is a zero-shot framework for controlling moving object trajectories in diffusion-based video generation. It enables text-to-video diffusion models to generate high-quality videos with precise control over object motion without requiring additional training. The framework introduces an initial noise prior module to provide position-based prior information, enhancing the stability of object appearance and position accuracy. Spatial constraints are applied using attention maps to ensure spatial consistency during inference. A shift temporal attention mechanism is also proposed to maintain temporal consistency across frames. The framework can be applied to various pre-trained video diffusion models, allowing for flexible and efficient motion control. Extensive experiments demonstrate that Motion-Zero can effectively control object motion trajectories and generate high-quality videos. The framework's zero-shot capability allows it to work with any pre-trained video diffusion model without additional training, making it versatile and efficient. The method achieves superior performance in controlling object motion compared to existing methods, with results showing improved semantic accuracy and temporal consistency. The framework is evaluated both qualitatively and quantitatively, with results indicating its effectiveness in generating videos with controlled motion trajectories. The method is also compared with other approaches, showing that it outperforms existing methods in terms of control capabilities and video quality. The framework's ability to control object motion without additional training makes it a promising solution for video generation tasks requiring precise motion control.Motion-Zero is a zero-shot framework for controlling moving object trajectories in diffusion-based video generation. It enables text-to-video diffusion models to generate high-quality videos with precise control over object motion without requiring additional training. The framework introduces an initial noise prior module to provide position-based prior information, enhancing the stability of object appearance and position accuracy. Spatial constraints are applied using attention maps to ensure spatial consistency during inference. A shift temporal attention mechanism is also proposed to maintain temporal consistency across frames. The framework can be applied to various pre-trained video diffusion models, allowing for flexible and efficient motion control. Extensive experiments demonstrate that Motion-Zero can effectively control object motion trajectories and generate high-quality videos. The framework's zero-shot capability allows it to work with any pre-trained video diffusion model without additional training, making it versatile and efficient. The method achieves superior performance in controlling object motion compared to existing methods, with results showing improved semantic accuracy and temporal consistency. The framework is evaluated both qualitatively and quantitatively, with results indicating its effectiveness in generating videos with controlled motion trajectories. The method is also compared with other approaches, showing that it outperforms existing methods in terms of control capabilities and video quality. The framework's ability to control object motion without additional training makes it a promising solution for video generation tasks requiring precise motion control.
Reach us at info@study.space
[slides and audio] Motion-Zero%3A Zero-Shot Moving Object Control Framework for Diffusion-Based Video Generation