FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird's-Eye View and Perspective View

FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird's-Eye View and Perspective View

5 Mar 2024 | Jiawei Hou, Xiaoyan Li, Wenhao Guan, Gang Zhang, Di Feng, Yuheng Du, Xiangyang Xue, Jian Pu
FastOcc is a novel method for accelerating 3D occupancy prediction by fusing 2D bird's-eye view (BEV) and perspective views. The method improves inference speed while maintaining accuracy by replacing the time-consuming 3D convolution network with a lightweight 2D BEV convolution network and integrating interpolated 3D voxel features. Experiments on the Occ3D-nuScenes benchmark show that FastOcc achieves state-of-the-art results with a fast inference speed, achieving an mIoU of 40.75 and reducing inference time to 63 ms, further to 32 ms with TensorRT acceleration. The method simplifies the 3D perception task by compressing features to BEV representation and decoding in 2D, then using interpolated 3D features to refine and enhance 2D features. The method also employs a novel occupancy prediction head that replaces 3D convolution blocks with a 2D BEV convolution network, significantly reducing computational overhead. The method is optimized for real-time performance and is compatible with autonomous driving requirements. FastOcc outperforms existing methods in terms of speed and accuracy, demonstrating its effectiveness in 3D occupancy prediction.FastOcc is a novel method for accelerating 3D occupancy prediction by fusing 2D bird's-eye view (BEV) and perspective views. The method improves inference speed while maintaining accuracy by replacing the time-consuming 3D convolution network with a lightweight 2D BEV convolution network and integrating interpolated 3D voxel features. Experiments on the Occ3D-nuScenes benchmark show that FastOcc achieves state-of-the-art results with a fast inference speed, achieving an mIoU of 40.75 and reducing inference time to 63 ms, further to 32 ms with TensorRT acceleration. The method simplifies the 3D perception task by compressing features to BEV representation and decoding in 2D, then using interpolated 3D features to refine and enhance 2D features. The method also employs a novel occupancy prediction head that replaces 3D convolution blocks with a 2D BEV convolution network, significantly reducing computational overhead. The method is optimized for real-time performance and is compatible with autonomous driving requirements. FastOcc outperforms existing methods in terms of speed and accuracy, demonstrating its effectiveness in 3D occupancy prediction.
Reach us at info@study.space