[slides] DreamScene4D%3A Dynamic Multi-Object Scene Generation from Monocular Videos

DreamScene4D is a novel approach to generate 3D dynamic scenes of multiple objects from monocular videos, focusing on 360° novel view synthesis. The key challenge in this task is the insufficient rendering error gradients to recover fast object motion and the difficulty in applying score distillation objectives at the scene level. DreamScene4D addresses these challenges through a "decompose-recompose" strategy, which factors the video scene into background and object tracks, and decomposes object motion into three components: object-centric deformation, object-to-world-frame transformation, and camera motion. This decomposition allows for more stable motion optimization and accurate 3D completions. The method is evaluated on challenging datasets like DAVIS, Kubric, and self-captured videos, demonstrating significant improvements over existing methods in terms of view rendering quality and 3D motion accuracy. DreamScene4D also provides accurate 2D persistent point tracks by projecting inferred 3D trajectories to 2D. The paper discusses related work, presents the method in detail, and includes experimental results and a user preference study to validate the effectiveness of DreamScene4D.DreamScene4D is a novel approach to generate 3D dynamic scenes of multiple objects from monocular videos, focusing on 360° novel view synthesis. The key challenge in this task is the insufficient rendering error gradients to recover fast object motion and the difficulty in applying score distillation objectives at the scene level. DreamScene4D addresses these challenges through a "decompose-recompose" strategy, which factors the video scene into background and object tracks, and decomposes object motion into three components: object-centric deformation, object-to-world-frame transformation, and camera motion. This decomposition allows for more stable motion optimization and accurate 3D completions. The method is evaluated on challenging datasets like DAVIS, Kubric, and self-captured videos, demonstrating significant improvements over existing methods in terms of view rendering quality and 3D motion accuracy. DreamScene4D also provides accurate 2D persistent point tracks by projecting inferred 3D trajectories to 2D. The paper discusses related work, presents the method in detail, and includes experimental results and a user preference study to validate the effectiveness of DreamScene4D.

DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos

23 May 2024 | Wen-Hsuan Chu†, Lei Ke†, Katerina Fragkiadaki