24 Jun 2024 | Haonan Qiu, Zhaoxi Chen, Zhouxia Wang, Yingqing He, Menghan Xia, Ziwei Liu
FreeTraj is a tuning-free trajectory control framework for video diffusion models, enabling precise control over generated video motion without requiring additional training. The method leverages both noise guidance and attention mechanisms to achieve trajectory controllability. By manipulating initial noises and attention computations, FreeTraj allows users to manually or automatically generate trajectories for video content. The framework is designed to enhance trajectory control while maintaining video quality, and it can be integrated into long video generation systems to produce larger and longer videos with controllable motion. Extensive experiments demonstrate that FreeTraj outperforms existing methods in trajectory control, with high-quality results and accurate alignment with input trajectories. The approach is flexible, allowing users to provide trajectories either manually or through an LLM-based trajectory planner. FreeTraj also addresses issues such as attention isolation by using soft attention masks instead of hard masks, reducing artifacts and improving overall video quality. The method is evaluated using metrics such as Fréchet Video Distance (FVD), Kernel Video Distance (KVD), and CLIP similarity, showing competitive performance in video quality and trajectory control. User studies further confirm the effectiveness of FreeTraj in generating videos with desired motion trajectories. The framework is a practical and efficient solution for generating videos with controllable motion, offering flexibility and performance in video diffusion models.FreeTraj is a tuning-free trajectory control framework for video diffusion models, enabling precise control over generated video motion without requiring additional training. The method leverages both noise guidance and attention mechanisms to achieve trajectory controllability. By manipulating initial noises and attention computations, FreeTraj allows users to manually or automatically generate trajectories for video content. The framework is designed to enhance trajectory control while maintaining video quality, and it can be integrated into long video generation systems to produce larger and longer videos with controllable motion. Extensive experiments demonstrate that FreeTraj outperforms existing methods in trajectory control, with high-quality results and accurate alignment with input trajectories. The approach is flexible, allowing users to provide trajectories either manually or through an LLM-based trajectory planner. FreeTraj also addresses issues such as attention isolation by using soft attention masks instead of hard masks, reducing artifacts and improving overall video quality. The method is evaluated using metrics such as Fréchet Video Distance (FVD), Kernel Video Distance (KVD), and CLIP similarity, showing competitive performance in video quality and trajectory control. User studies further confirm the effectiveness of FreeTraj in generating videos with desired motion trajectories. The framework is a practical and efficient solution for generating videos with controllable motion, offering flexibility and performance in video diffusion models.