CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Understanding

CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Understanding

22 Apr 2024 | Guibiao Liao, Jiankun Li, Zhenyu Bao, Xiaoqing Ye, Jingdong Wang, Qing Li, Kanglin Liu
CLIP-GS is a method that integrates semantic information from CLIP into 3D Gaussian Splatting (3DGS) to achieve real-time and view-consistent 3D semantic understanding without requiring annotated semantic data. The method addresses two main challenges: efficiency and semantic consistency. To improve efficiency, the paper introduces the Semantic Attribute Compactness (SAC) approach, which leverages the unified semantics of objects to learn compact and effective semantic representations of 3D Gaussians, enabling rendering speeds over 100 FPS. To address semantic consistency, the paper proposes the 3D Coherent Self-training (3DCS) strategy, which uses cross-view semantic consistency constraints derived from trained 3D Gaussians to enhance view-consistent segmentation results. Extensive experiments show that CLIP-GS outperforms existing state-of-the-art methods on the Replica and ScanNet datasets, achieving improvements of 17.29% and 20.81% in mIoU metrics, respectively. The method also demonstrates robustness even with sparse input data. The paper also includes ablation studies that validate the effectiveness of SAC, 3DCS, and other components of the method. Overall, CLIP-GS provides an efficient and effective approach to 3D semantic understanding using 3D Gaussian Splatting.CLIP-GS is a method that integrates semantic information from CLIP into 3D Gaussian Splatting (3DGS) to achieve real-time and view-consistent 3D semantic understanding without requiring annotated semantic data. The method addresses two main challenges: efficiency and semantic consistency. To improve efficiency, the paper introduces the Semantic Attribute Compactness (SAC) approach, which leverages the unified semantics of objects to learn compact and effective semantic representations of 3D Gaussians, enabling rendering speeds over 100 FPS. To address semantic consistency, the paper proposes the 3D Coherent Self-training (3DCS) strategy, which uses cross-view semantic consistency constraints derived from trained 3D Gaussians to enhance view-consistent segmentation results. Extensive experiments show that CLIP-GS outperforms existing state-of-the-art methods on the Replica and ScanNet datasets, achieving improvements of 17.29% and 20.81% in mIoU metrics, respectively. The method also demonstrates robustness even with sparse input data. The paper also includes ablation studies that validate the effectiveness of SAC, 3DCS, and other components of the method. Overall, CLIP-GS provides an efficient and effective approach to 3D semantic understanding using 3D Gaussian Splatting.
Reach us at info@study.space