22 Jan 2024 | Changgu Chen, Junwei Shu, Lianggangxu Chen, Gaoqi He, Changbo Wang*, and Yang Li*
**Motion-Zero: Zero-Shot Moving Object Control Framework for Diffusion-Based Video Generation**
This paper introduces Motion-Zero, a novel framework for controlling the motion trajectories of objects in videos generated by diffusion models. The framework is designed to be zero-shot, meaning it can be applied to any pre-trained video diffusion model without requiring additional training. Key components of Motion-Zero include:
1. **Initial Noise Prior Module (INPM)**: This module provides a position-based prior to improve the stability and accuracy of the moving object's appearance and position.
2. **Spatial Constraints**: Based on the attention map of the U-Net, spatial constraints are applied to the denoising process to ensure positional and spatial consistency of moving objects.
3. **Shift Temporal Attention Mechanism (STAM)**: This mechanism ensures temporal consistency by maintaining the focus on the same objects across different frames.
The framework is evaluated on various state-of-the-art video diffusion models, demonstrating its effectiveness in controlling object motion trajectories and generating high-quality videos. Extensive experiments show that Motion-Zero can outperform existing trajectory control methods without the need for extensive training or computational resources. The paper also includes qualitative and quantitative evaluations, user studies, and ablation studies to validate the effectiveness of each component of the framework.**Motion-Zero: Zero-Shot Moving Object Control Framework for Diffusion-Based Video Generation**
This paper introduces Motion-Zero, a novel framework for controlling the motion trajectories of objects in videos generated by diffusion models. The framework is designed to be zero-shot, meaning it can be applied to any pre-trained video diffusion model without requiring additional training. Key components of Motion-Zero include:
1. **Initial Noise Prior Module (INPM)**: This module provides a position-based prior to improve the stability and accuracy of the moving object's appearance and position.
2. **Spatial Constraints**: Based on the attention map of the U-Net, spatial constraints are applied to the denoising process to ensure positional and spatial consistency of moving objects.
3. **Shift Temporal Attention Mechanism (STAM)**: This mechanism ensures temporal consistency by maintaining the focus on the same objects across different frames.
The framework is evaluated on various state-of-the-art video diffusion models, demonstrating its effectiveness in controlling object motion trajectories and generating high-quality videos. Extensive experiments show that Motion-Zero can outperform existing trajectory control methods without the need for extensive training or computational resources. The paper also includes qualitative and quantitative evaluations, user studies, and ablation studies to validate the effectiveness of each component of the framework.