[slides] MagicDrive3D%3A Controllable 3D Generation for Any-View Rendering in Street Scenes

**MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes** This paper introduces MagicDrive3D, a novel framework for controllable 3D street scene generation. MagicDrive3D combines geometry-free view synthesis and geometry-focused reconstruction to generate high-quality, realistic 3D street scenes that support any-view rendering. The framework addresses the challenges of scene dynamics and data collection discrepancies by first training a video generation model and then reconstructing from the generated data. Key contributions include: 1. **Multi-condition Control**: MagicDrive3D supports multi-condition control, including BEV maps, 3D objects, and text descriptions. 2. **Video Generation Model**: It uses a multi-view video generation model to synthesize multiple views of a static scene, enhancing inter-frame 3D consistency. 3. **Enhanced 3D Gaussian Splatting**: Improved reconstruction quality through deformable Gaussian splatting, managing exposure discrepancies and local dynamics. 4. **Monocular Depth Prior**: Utilizes monocular depth estimation for accurate alignment in sparse-view settings. 5. **Appearance Embedding Maps**: Introduces appearance embedding maps to handle appearance differences across viewpoints. Experiments on the nuScenes dataset demonstrate that MagicDrive3D generates diverse, high-quality 3D driving scenes, supporting any-view rendering and enhancing downstream tasks like BEV segmentation. The framework's superior performance highlights its potential for autonomous driving simulation and beyond.**MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes** This paper introduces MagicDrive3D, a novel framework for controllable 3D street scene generation. MagicDrive3D combines geometry-free view synthesis and geometry-focused reconstruction to generate high-quality, realistic 3D street scenes that support any-view rendering. The framework addresses the challenges of scene dynamics and data collection discrepancies by first training a video generation model and then reconstructing from the generated data. Key contributions include: 1. **Multi-condition Control**: MagicDrive3D supports multi-condition control, including BEV maps, 3D objects, and text descriptions. 2. **Video Generation Model**: It uses a multi-view video generation model to synthesize multiple views of a static scene, enhancing inter-frame 3D consistency. 3. **Enhanced 3D Gaussian Splatting**: Improved reconstruction quality through deformable Gaussian splatting, managing exposure discrepancies and local dynamics. 4. **Monocular Depth Prior**: Utilizes monocular depth estimation for accurate alignment in sparse-view settings. 5. **Appearance Embedding Maps**: Introduces appearance embedding maps to handle appearance differences across viewpoints. Experiments on the nuScenes dataset demonstrate that MagicDrive3D generates diverse, high-quality 3D driving scenes, supporting any-view rendering and enhancing downstream tasks like BEV segmentation. The framework's superior performance highlights its potential for autonomous driving simulation and beyond.

MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes

23 May 2024 | Ruiyuan Gao1, Kai Chen2, Zhihao Li3, Lanqing Hong3†, Zhenguo Li3, Qiang Xu1†