23 Apr 2024 | Guoqing Wang, Zhongdao Wang, Pin Tang, Jilai Zheng, Xiangxuan Ren, Bailan Feng, and Chao Ma
**OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving**
**Authors:** Guoqing Wang, Zhongdao Wang, Pin Tang, Jilai Zheng, Xiangxuan Ren, Bailan Feng, Chao Ma
**Institution:** MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University; Huawei Noah’s Ark Lab
**Project Page:** <https://occgen-ad.github.io/>
**Abstract:**
This paper introduces OccGen, a generative perception model for 3D semantic occupancy prediction. Unlike discriminative methods that focus on mapping inputs to occupancy maps in a single step, OccGen adopts a "noise-to-occupancy" paradigm, progressively refining the occupancy map by predicting and eliminating noise from a random Gaussian distribution. OccGen consists of a conditional encoder and a progressive refinement decoder. The conditional encoder processes multi-modal inputs, while the decoder refines the occupancy map through diffusion denoising. Extensive experiments on multiple benchmarks demonstrate the effectiveness of OccGen, showing improvements of 9.5%, 6.3%, and 13.3% in mIoU on nuScenes-Occupancy under multi-modal, LiDAR-only, and camera-only settings, respectively. OccGen also exhibits desirable properties such as uncertainty estimation and progressive inference.
**Keywords:** Occupancy, Generative Model, Diffusion, Multi-modal
**Introduction:**
3D semantic occupancy prediction is crucial for autonomous driving systems. Existing methods often treat the task as a one-shot 3D voxel-wise segmentation problem, lacking the ability to refine the occupancy map gradually. OccGen addresses this by using a generative approach, which can model the coarse-to-fine refinement of the dense 3D occupancy map more effectively.
**Method:**
OccGen's generative pipeline involves a conditional encoder and a progressive refinement decoder. The encoder processes multi-modal inputs, while the decoder refines the occupancy map through diffusion denoising. The diffusion process naturally models the coarse-to-fine refinement, leading to more detailed predictions.
**Experiments:**
OccGen is evaluated on the nuScenes-Occupancy and SemanticKITTI datasets. Results show that OccGen outperforms state-of-the-art methods, achieving significant improvements in mIoU. Ablation studies and qualitative results further validate the effectiveness of OccGen's components and its performance.
**Conclusion:**
OccGen is a powerful generative model for 3D semantic occupancy prediction, offering improved performance and desirable properties such as uncertainty estimation and progressive inference.**OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving**
**Authors:** Guoqing Wang, Zhongdao Wang, Pin Tang, Jilai Zheng, Xiangxuan Ren, Bailan Feng, Chao Ma
**Institution:** MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University; Huawei Noah’s Ark Lab
**Project Page:** <https://occgen-ad.github.io/>
**Abstract:**
This paper introduces OccGen, a generative perception model for 3D semantic occupancy prediction. Unlike discriminative methods that focus on mapping inputs to occupancy maps in a single step, OccGen adopts a "noise-to-occupancy" paradigm, progressively refining the occupancy map by predicting and eliminating noise from a random Gaussian distribution. OccGen consists of a conditional encoder and a progressive refinement decoder. The conditional encoder processes multi-modal inputs, while the decoder refines the occupancy map through diffusion denoising. Extensive experiments on multiple benchmarks demonstrate the effectiveness of OccGen, showing improvements of 9.5%, 6.3%, and 13.3% in mIoU on nuScenes-Occupancy under multi-modal, LiDAR-only, and camera-only settings, respectively. OccGen also exhibits desirable properties such as uncertainty estimation and progressive inference.
**Keywords:** Occupancy, Generative Model, Diffusion, Multi-modal
**Introduction:**
3D semantic occupancy prediction is crucial for autonomous driving systems. Existing methods often treat the task as a one-shot 3D voxel-wise segmentation problem, lacking the ability to refine the occupancy map gradually. OccGen addresses this by using a generative approach, which can model the coarse-to-fine refinement of the dense 3D occupancy map more effectively.
**Method:**
OccGen's generative pipeline involves a conditional encoder and a progressive refinement decoder. The encoder processes multi-modal inputs, while the decoder refines the occupancy map through diffusion denoising. The diffusion process naturally models the coarse-to-fine refinement, leading to more detailed predictions.
**Experiments:**
OccGen is evaluated on the nuScenes-Occupancy and SemanticKITTI datasets. Results show that OccGen outperforms state-of-the-art methods, achieving significant improvements in mIoU. Ablation studies and qualitative results further validate the effectiveness of OccGen's components and its performance.
**Conclusion:**
OccGen is a powerful generative model for 3D semantic occupancy prediction, offering improved performance and desirable properties such as uncertainty estimation and progressive inference.