[slides] SparseOcc%3A Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction

**SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction** This paper addresses the challenge of accurate 3D semantic occupancy prediction in autonomous driving by proposing SparseOcc, an efficient occupancy network inspired by sparse point cloud processing. Traditional methods often use dense or projection-based representations (e.g., Bird's Eye View, TPV) to map 2D latent representations to 3D space, but these approaches suffer from high computational complexity and information loss. SparseOcc introduces a lossless sparse latent representation with three key innovations: 1. **Sparse Latent Diffuser**: This component uses spatially decomposed 3D sparse convolutional kernels to complete non-empty features to adjacent empty regions, enhancing scene completion efficiency. 2. **Sparse Feature Pyramid**: A feature pyramid that incorporates sparse interpolation operations to enhance scales with information from other scales, reducing the need for excessive diffusers within each scale. 3. **Sparse Transformer Head**: A redesigned 3D sparse transformer head responsible for generating semantic occupancy predictions, focusing only on occupied voxels to achieve significant computational savings. SparseOcc achieves a 74.9% reduction in FLOPs over dense baselines and improves accuracy from 12.8% to 14.1% mIoU, demonstrating superior performance and efficiency. The method is evaluated on the nuScenes-Occupancy and SemanticKITTI datasets, showing state-of-the-art results and comparable performance to existing methods, respectively.**SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction** This paper addresses the challenge of accurate 3D semantic occupancy prediction in autonomous driving by proposing SparseOcc, an efficient occupancy network inspired by sparse point cloud processing. Traditional methods often use dense or projection-based representations (e.g., Bird's Eye View, TPV) to map 2D latent representations to 3D space, but these approaches suffer from high computational complexity and information loss. SparseOcc introduces a lossless sparse latent representation with three key innovations: 1. **Sparse Latent Diffuser**: This component uses spatially decomposed 3D sparse convolutional kernels to complete non-empty features to adjacent empty regions, enhancing scene completion efficiency. 2. **Sparse Feature Pyramid**: A feature pyramid that incorporates sparse interpolation operations to enhance scales with information from other scales, reducing the need for excessive diffusers within each scale. 3. **Sparse Transformer Head**: A redesigned 3D sparse transformer head responsible for generating semantic occupancy predictions, focusing only on occupied voxels to achieve significant computational savings. SparseOcc achieves a 74.9% reduction in FLOPs over dense baselines and improves accuracy from 12.8% to 14.1% mIoU, demonstrating superior performance and efficiency. The method is evaluated on the nuScenes-Occupancy and SemanticKITTI datasets, showing state-of-the-art results and comparable performance to existing methods, respectively.

SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction

15 Apr 2024 | Pin Tang, Zhongdao Wang, Guoqing Wang, Jilai Zheng, Xiangxuan Ren, Bailan Feng, Chao Ma