1 May 2024 | Teng Hu*, Jiangning Zhang*, Ran Yi†, Yating Wang, Hongrui Huang, Jieyu Weng, Yabiao Wang, Lizhuang Ma
MotionMaster is a training-free camera motion transfer model that disentangles camera and object motions in source videos and transfers the extracted camera motions to new videos. The model uses temporal attention maps to separate camera and object motions, enabling flexible and controllable camera motion in video generation. It proposes two methods for disentanglement: one-shot and few-shot. The one-shot method extracts camera motion from a single video by solving a Poisson equation, while the few-shot method uses clustering to extract common camera motion from multiple videos. The model also introduces a camera motion combination method to enable flexible control over different camera motions. Extensive experiments show that MotionMaster achieves superior performance in both one-shot and few-shot scenarios, demonstrating effective camera motion transfer and high-quality video generation. The model is capable of handling complex camera motions such as dolly zoom and variable-speed zoom, and can be applied to a wide range of controllable video generation tasks. MotionMaster reduces training costs and provides more flexible camera control compared to existing methods that rely on training temporal camera modules. The model is evaluated using metrics such as FVD, FID-V, and Optical Flow Distance, showing its effectiveness in generating high-quality and diverse videos with accurate camera motion.MotionMaster is a training-free camera motion transfer model that disentangles camera and object motions in source videos and transfers the extracted camera motions to new videos. The model uses temporal attention maps to separate camera and object motions, enabling flexible and controllable camera motion in video generation. It proposes two methods for disentanglement: one-shot and few-shot. The one-shot method extracts camera motion from a single video by solving a Poisson equation, while the few-shot method uses clustering to extract common camera motion from multiple videos. The model also introduces a camera motion combination method to enable flexible control over different camera motions. Extensive experiments show that MotionMaster achieves superior performance in both one-shot and few-shot scenarios, demonstrating effective camera motion transfer and high-quality video generation. The model is capable of handling complex camera motions such as dolly zoom and variable-speed zoom, and can be applied to a wide range of controllable video generation tasks. MotionMaster reduces training costs and provides more flexible camera control compared to existing methods that rely on training temporal camera modules. The model is evaluated using metrics such as FVD, FID-V, and Optical Flow Distance, showing its effectiveness in generating high-quality and diverse videos with accurate camera motion.