MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes

MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes

23 May 2024 | Ruiyuan Gao, Kai Chen, Zhihao Li, Lanqing Hong, Zhenguo Li, Qiang Xu
MagicDrive3D is a novel framework for controllable 3D street scene generation that supports multi-condition control, including BEV maps, 3D objects, and text descriptions. Unlike previous methods that reconstruct before training the generative models, MagicDrive3D first trains a video generation model and then reconstructs from the generated data. This approach enables easily controllable generation and static scene acquisition, resulting in high-quality scene reconstruction. To address minor errors in generated content, deformable Gaussian splatting with monocular depth initialization and appearance modeling are used to manage exposure discrepancies across viewpoints. Validated on the nuScenes dataset, MagicDrive3D generates diverse, high-quality 3D driving scenes that support any-view rendering and enhance downstream tasks like BEV segmentation. The framework's superior performance demonstrates its transformative potential for autonomous driving simulation and beyond. MagicDrive3D integrates geometry-free view synthesis and geometry-focused reconstruction for controllable 3D street scene generation. It begins by training a multi-view video generation model to synthesize multiple views of a static scene, using controls from object boxes, road maps, text prompts, and camera poses. To enhance inter-frame 3D consistency, coordinate embeddings representing the relative transformation between LiDAR coordinates are incorporated. Next, reconstruction quality is improved through deformable Gaussian splatting and appearance embedding maps to handle local dynamics and exposure discrepancies. The framework generates highly realistic street scenes that align with road maps, 3D bounding boxes, and text descriptions. Generated camera views can augment training for BEV segmentation tasks, providing comprehensive controls for scene generation and enabling the creation of novel street scenes for autonomous driving simulation. MagicDrive3D is the first to achieve controllable 3D street scene generation using a training dataset with only six camera perspectives, as seen in the nuScenes dataset. The framework's contributions include proposing MagicDrive3D, the first framework to effectively integrate both geometry-free and geometry-focused view synthesis for controllable 3D street scene generation. It introduces a relative pose embedding technique to generate videos with improved 3D consistency and enhances reconstruction quality with tailored techniques, including deformable Gaussian splatting, to handle local dynamics and exposure discrepancies. Through extensive experiments, MagicDrive3D generates high-quality street scenes with multi-dimensional controllability, demonstrating that synthetic data improves 3D perception tasks. The framework's results show its effectiveness in generating realistic 3D scenes that support any-view rendering and enhance downstream tasks like BEV segmentation.MagicDrive3D is a novel framework for controllable 3D street scene generation that supports multi-condition control, including BEV maps, 3D objects, and text descriptions. Unlike previous methods that reconstruct before training the generative models, MagicDrive3D first trains a video generation model and then reconstructs from the generated data. This approach enables easily controllable generation and static scene acquisition, resulting in high-quality scene reconstruction. To address minor errors in generated content, deformable Gaussian splatting with monocular depth initialization and appearance modeling are used to manage exposure discrepancies across viewpoints. Validated on the nuScenes dataset, MagicDrive3D generates diverse, high-quality 3D driving scenes that support any-view rendering and enhance downstream tasks like BEV segmentation. The framework's superior performance demonstrates its transformative potential for autonomous driving simulation and beyond. MagicDrive3D integrates geometry-free view synthesis and geometry-focused reconstruction for controllable 3D street scene generation. It begins by training a multi-view video generation model to synthesize multiple views of a static scene, using controls from object boxes, road maps, text prompts, and camera poses. To enhance inter-frame 3D consistency, coordinate embeddings representing the relative transformation between LiDAR coordinates are incorporated. Next, reconstruction quality is improved through deformable Gaussian splatting and appearance embedding maps to handle local dynamics and exposure discrepancies. The framework generates highly realistic street scenes that align with road maps, 3D bounding boxes, and text descriptions. Generated camera views can augment training for BEV segmentation tasks, providing comprehensive controls for scene generation and enabling the creation of novel street scenes for autonomous driving simulation. MagicDrive3D is the first to achieve controllable 3D street scene generation using a training dataset with only six camera perspectives, as seen in the nuScenes dataset. The framework's contributions include proposing MagicDrive3D, the first framework to effectively integrate both geometry-free and geometry-focused view synthesis for controllable 3D street scene generation. It introduces a relative pose embedding technique to generate videos with improved 3D consistency and enhances reconstruction quality with tailored techniques, including deformable Gaussian splatting, to handle local dynamics and exposure discrepancies. Through extensive experiments, MagicDrive3D generates high-quality street scenes with multi-dimensional controllability, demonstrating that synthetic data improves 3D perception tasks. The framework's results show its effectiveness in generating realistic 3D scenes that support any-view rendering and enhance downstream tasks like BEV segmentation.
Reach us at info@study.space
[slides] MagicDrive3D%3A Controllable 3D Generation for Any-View Rendering in Street Scenes | StudySpace