14 Jun 2024 | Chen Hou1*, Guoqiang Wei2, Yan Zeng2, Zhibo Chen1
The paper introduces CamTrol, a training-free and robust method for controlling camera movements in video generation using off-the-shelf video diffusion models. Unlike previous methods that require supervised fine-tuning on camera-annotated datasets or self-supervised training via data augmentation, CamTrol leverages the layout prior of noisy latents to achieve camera control. The method consists of two stages: first, explicit camera movements are modeled in 3D point cloud space, and second, videos are generated with camera motion using the layout prior of noisy latents. Extensive experiments demonstrate the effectiveness and robustness of CamTrol in controlling camera motion, generating videos with dynamic content, and producing impressive results in generating 3D rotation videos. The method is evaluated on various camera trajectories and compared with state-of-the-art methods, showing superior performance in terms of video quality and camera motion control.The paper introduces CamTrol, a training-free and robust method for controlling camera movements in video generation using off-the-shelf video diffusion models. Unlike previous methods that require supervised fine-tuning on camera-annotated datasets or self-supervised training via data augmentation, CamTrol leverages the layout prior of noisy latents to achieve camera control. The method consists of two stages: first, explicit camera movements are modeled in 3D point cloud space, and second, videos are generated with camera motion using the layout prior of noisy latents. Extensive experiments demonstrate the effectiveness and robustness of CamTrol in controlling camera motion, generating videos with dynamic content, and producing impressive results in generating 3D rotation videos. The method is evaluated on various camera trajectories and compared with state-of-the-art methods, showing superior performance in terms of video quality and camera motion control.