Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis

Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis

5 Jul 2024 | Basile Van Hoorick, Rundi Wu, Ege Ozguroglu, Kyle Sargent, Ruoshi Liu, Pavel Tokmakov, Achal Dave, Changxi Zheng, and Carl Vondrick
The paper introduces GCD (Generative Camera Dolly), a controllable monocular dynamic view synthesis pipeline that generates a synchronous video from any chosen perspective, conditioned on relative camera pose parameters. GCD leverages large-scale diffusion priors to achieve this, without requiring depth input or explicit 3D scene geometry modeling. The model performs end-to-end video-to-video translation, making it efficient and effective for dynamic novel view synthesis. Despite being trained on synthetic multi-view video data, GCD shows promising results in various real-world domains, including robotics, object permanence, and driving environments. The core contribution is the design and evaluation of GCD, which demonstrates advanced spatiotemporal reasoning capabilities and the ability to handle complex dynamic scenes with significant camera viewpoint changes. The method is evaluated on datasets like Kubric-4D and ParallelDomain-4D, showing superior performance over state-of-the-art baselines in terms of scene layout and dynamics reconstruction.The paper introduces GCD (Generative Camera Dolly), a controllable monocular dynamic view synthesis pipeline that generates a synchronous video from any chosen perspective, conditioned on relative camera pose parameters. GCD leverages large-scale diffusion priors to achieve this, without requiring depth input or explicit 3D scene geometry modeling. The model performs end-to-end video-to-video translation, making it efficient and effective for dynamic novel view synthesis. Despite being trained on synthetic multi-view video data, GCD shows promising results in various real-world domains, including robotics, object permanence, and driving environments. The core contribution is the design and evaluation of GCD, which demonstrates advanced spatiotemporal reasoning capabilities and the ability to handle complex dynamic scenes with significant camera viewpoint changes. The method is evaluated on datasets like Kubric-4D and ParallelDomain-4D, showing superior performance over state-of-the-art baselines in terms of scene layout and dynamics reconstruction.
Reach us at info@study.space