29 Apr 2024 | Zhenxing Ming, Julie Stephany Berrio, Mao Shan, Stewart Worrall
**InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy Prediction**
This paper introduces InverseMatrixVT3D, a novel method for transforming multi-view image features into 3D feature volumes for 3D semantic occupancy prediction. Unlike existing methods that often rely on depth estimation, device-specific operators, or transformer queries, InverseMatrixVT3D leverages two projection matrices to store static mapping relationships and uses matrix multiplications to efficiently generate global Bird's Eye View (BEV) features and local 3D feature volumes. The method optimizes GPU memory usage through a sparse matrix handling technique and integrates global BEV features with local 3D feature volumes using a global-local attention fusion module. Additionally, a multi-scale supervision mechanism is employed to enhance performance. Extensive experiments on the nuScenes and SemanticKITTI datasets demonstrate that InverseMatrixVT3D achieves top performance in detecting vulnerable road users (VRUs), which is crucial for autonomous driving and road safety. The code is available at: https://github.com/DanielMing123/InverseMatrixVT3D.**InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy Prediction**
This paper introduces InverseMatrixVT3D, a novel method for transforming multi-view image features into 3D feature volumes for 3D semantic occupancy prediction. Unlike existing methods that often rely on depth estimation, device-specific operators, or transformer queries, InverseMatrixVT3D leverages two projection matrices to store static mapping relationships and uses matrix multiplications to efficiently generate global Bird's Eye View (BEV) features and local 3D feature volumes. The method optimizes GPU memory usage through a sparse matrix handling technique and integrates global BEV features with local 3D feature volumes using a global-local attention fusion module. Additionally, a multi-scale supervision mechanism is employed to enhance performance. Extensive experiments on the nuScenes and SemanticKITTI datasets demonstrate that InverseMatrixVT3D achieves top performance in detecting vulnerable road users (VRUs), which is crucial for autonomous driving and road safety. The code is available at: https://github.com/DanielMing123/InverseMatrixVT3D.