SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction

SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction

15 Apr 2024 | Pin Tang, Zhongdao Wang, Guoqing Wang, Jilai Zheng, Xiangxuan Ren, Bailan Feng, Chao Ma
SparseOcc is a novel method for vision-based semantic occupancy prediction that uses a sparse latent representation to achieve high efficiency and accuracy. The method is inspired by sparse point cloud processing and introduces three key innovations: a 3D sparse diffuser for latent completion, a feature pyramid for scale enhancement, and a sparse transformer head for semantic occupancy prediction. The sparse diffuser uses spatially decomposed 3D sparse convolutional kernels to propagate non-empty features to adjacent empty regions, enabling scene completion. The feature pyramid enhances scales through sparse interpolation, expanding the reception field and reducing the need for excessive diffusers. The sparse transformer head generates semantic occupancy predictions by focusing only on occupied voxels, significantly reducing computational costs. SparseOcc achieves a 74.9% reduction in FLOPs compared to dense baselines, while improving accuracy from 12.8% to 14.1% mIOU. This improvement is attributed to the sparse representation's ability to avoid hallucinations on empty voxels. The method is evaluated on the nuScenes-Occupancy and SemanticKITTI benchmarks, where it outperforms existing methods in terms of both accuracy and efficiency. SparseOcc reduces FLOPs by 59.8-74.9% and memory usage by 31.6-40.9%, while maintaining high semantic occupancy accuracy. The method also achieves a 44.2% reduction in FLOPs compared to OccFormer, demonstrating its efficiency and effectiveness. SparseOcc's sparse latent diffuser and learned sparse feature pyramid enable efficient scene completion and accurate semantic occupancy prediction. The method is validated through qualitative and ablation studies, showing its superiority in performance and efficiency.SparseOcc is a novel method for vision-based semantic occupancy prediction that uses a sparse latent representation to achieve high efficiency and accuracy. The method is inspired by sparse point cloud processing and introduces three key innovations: a 3D sparse diffuser for latent completion, a feature pyramid for scale enhancement, and a sparse transformer head for semantic occupancy prediction. The sparse diffuser uses spatially decomposed 3D sparse convolutional kernels to propagate non-empty features to adjacent empty regions, enabling scene completion. The feature pyramid enhances scales through sparse interpolation, expanding the reception field and reducing the need for excessive diffusers. The sparse transformer head generates semantic occupancy predictions by focusing only on occupied voxels, significantly reducing computational costs. SparseOcc achieves a 74.9% reduction in FLOPs compared to dense baselines, while improving accuracy from 12.8% to 14.1% mIOU. This improvement is attributed to the sparse representation's ability to avoid hallucinations on empty voxels. The method is evaluated on the nuScenes-Occupancy and SemanticKITTI benchmarks, where it outperforms existing methods in terms of both accuracy and efficiency. SparseOcc reduces FLOPs by 59.8-74.9% and memory usage by 31.6-40.9%, while maintaining high semantic occupancy accuracy. The method also achieves a 44.2% reduction in FLOPs compared to OccFormer, demonstrating its efficiency and effectiveness. SparseOcc's sparse latent diffuser and learned sparse feature pyramid enable efficient scene completion and accurate semantic occupancy prediction. The method is validated through qualitative and ablation studies, showing its superiority in performance and efficiency.
Reach us at info@study.space