21 Mar 2024 | Bohao Peng, Xiaoyang Wu, Li Jiang, Yukang Chen, Hengshuang Zhao, Zhuotao Tian, Jiaya Jia
The paper "Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation" by Bohao Peng, Xiaoyang Wu, Li Jiang, Yukang Chen, Hengshuang Zhao, Zhuotao Tian, and Jiaya Jia explores the potential of sparse CNNs in 3D semantic segmentation, particularly in comparison to point cloud transformers. The authors identify adaptivity as the key factor that distinguishes the performance of sparse CNNs from point transformers. They propose two key components to enhance adaptivity: spatially adaptive receptive fields and adaptive relations. These components are integrated into a lightweight module, resulting in Omni-Adaptive 3D CNNs (OA-CNNs). OA-CNNs achieve superior accuracy in both indoor and outdoor scenes compared to point transformers, with significantly lower latency and memory usage. The method outperforms state-of-the-art point-based methods on benchmarks such as ScanNet v2, nuScenes, and SemanticKITTI, achieving mIoU scores of 76.1%, 78.9%, and 70.6% respectively. The paper also includes detailed experimental results, ablation studies, and visualizations to support the effectiveness of the proposed approach.The paper "Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation" by Bohao Peng, Xiaoyang Wu, Li Jiang, Yukang Chen, Hengshuang Zhao, Zhuotao Tian, and Jiaya Jia explores the potential of sparse CNNs in 3D semantic segmentation, particularly in comparison to point cloud transformers. The authors identify adaptivity as the key factor that distinguishes the performance of sparse CNNs from point transformers. They propose two key components to enhance adaptivity: spatially adaptive receptive fields and adaptive relations. These components are integrated into a lightweight module, resulting in Omni-Adaptive 3D CNNs (OA-CNNs). OA-CNNs achieve superior accuracy in both indoor and outdoor scenes compared to point transformers, with significantly lower latency and memory usage. The method outperforms state-of-the-art point-based methods on benchmarks such as ScanNet v2, nuScenes, and SemanticKITTI, achieving mIoU scores of 76.1%, 78.9%, and 70.6% respectively. The paper also includes detailed experimental results, ablation studies, and visualizations to support the effectiveness of the proposed approach.