Understanding Semantic Gaussians%3A Open-Vocabulary Scene Understanding with 3D Gaussian Splatting

**Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting** Open-vocabulary 3D scene understanding is a significant challenge in computer vision, with applications in embodied agents and augmented reality systems. Existing methods often use neural rendering techniques to represent 3D scenes and jointly optimize color and semantic features. This paper introduces Semantic Gaussians, a novel approach based on 3D Gaussian Splatting that distills knowledge from 2D pre-trained models into 3D Gaussians. Unlike previous methods, Semantic Gaussians employs a versatile projection approach to map various 2D semantic features from pre-trained image encoders into a novel semantic component of 3D Gaussians, which is based on spatial relationships and requires no additional training. Additionally, a 3D semantic network is introduced to directly predict the semantic component from raw 3D Gaussians for fast inference. Experiments on the ScanNet segmentation benchmark and LERF object localization dataset demonstrate the superior performance of Semantic Gaussians. The method is also explored for various applications, including object part segmentation, instance segmentation, scene editing, and spatiotemporal tracking, showing better qualitative results compared to 2D and 3D baselines. **Contributions:** 1. Introduce Semantic Gaussians, a novel approach to open-vocabulary 3D scene understanding by adding a semantic component to 3D Gaussian Splatting. 2. Propose a versatile semantic feature projection framework to map pre-trained 2D features to 3D Gaussian points and introduce a 3D semantic network to predict semantic components directly from raw 3D Gaussians. 3. Conduct experiments on the ScanNet and LERF datasets to demonstrate the effectiveness of Semantic Gaussians and explore applications such as object part segmentation, instance segmentation, scene editing, and spatiotemporal tracking. **Methods:** - **2D Versatile Projection:** Extract pixel-level semantic maps from 2D pre-trained models and project them into 3D Gaussians. - **3D Semantic Network:** Predict semantic components from raw 3D Gaussians using a 3D sparse convolutional network. - **Inference:** Perform language-driven open-vocabulary scene understanding by comparing text embeddings with semantic components of 3D Gaussians. **Experiments:** - **Quantitative Results:** Evaluate on the ScanNet segmentation benchmark and LERF object localization dataset. - **Qualitative Evaluations:** Show performance on part segmentation, spatiotemporal tracking, instance segmentation, and scene editing. **Conclusion:** Semantic Gaussians effectively addresses open-vocabulary 3D scene understanding by leveraging 3D Gaussian Splatting and pre-trained 2D models. The method demonstrates superior performance and versatility in various downstream tasks, paving the way for real-world applications in embodied agents and augmented reality systems.**Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting** Open-vocabulary 3D scene understanding is a significant challenge in computer vision, with applications in embodied agents and augmented reality systems. Existing methods often use neural rendering techniques to represent 3D scenes and jointly optimize color and semantic features. This paper introduces Semantic Gaussians, a novel approach based on 3D Gaussian Splatting that distills knowledge from 2D pre-trained models into 3D Gaussians. Unlike previous methods, Semantic Gaussians employs a versatile projection approach to map various 2D semantic features from pre-trained image encoders into a novel semantic component of 3D Gaussians, which is based on spatial relationships and requires no additional training. Additionally, a 3D semantic network is introduced to directly predict the semantic component from raw 3D Gaussians for fast inference. Experiments on the ScanNet segmentation benchmark and LERF object localization dataset demonstrate the superior performance of Semantic Gaussians. The method is also explored for various applications, including object part segmentation, instance segmentation, scene editing, and spatiotemporal tracking, showing better qualitative results compared to 2D and 3D baselines. **Contributions:** 1. Introduce Semantic Gaussians, a novel approach to open-vocabulary 3D scene understanding by adding a semantic component to 3D Gaussian Splatting. 2. Propose a versatile semantic feature projection framework to map pre-trained 2D features to 3D Gaussian points and introduce a 3D semantic network to predict semantic components directly from raw 3D Gaussians. 3. Conduct experiments on the ScanNet and LERF datasets to demonstrate the effectiveness of Semantic Gaussians and explore applications such as object part segmentation, instance segmentation, scene editing, and spatiotemporal tracking. **Methods:** - **2D Versatile Projection:** Extract pixel-level semantic maps from 2D pre-trained models and project them into 3D Gaussians. - **3D Semantic Network:** Predict semantic components from raw 3D Gaussians using a 3D sparse convolutional network. - **Inference:** Perform language-driven open-vocabulary scene understanding by comparing text embeddings with semantic components of 3D Gaussians. **Experiments:** - **Quantitative Results:** Evaluate on the ScanNet segmentation benchmark and LERF object localization dataset. - **Qualitative Evaluations:** Show performance on part segmentation, spatiotemporal tracking, instance segmentation, and scene editing. **Conclusion:** Semantic Gaussians effectively addresses open-vocabulary 3D scene understanding by leveraging 3D Gaussian Splatting and pre-trained 2D models. The method demonstrates superior performance and versatility in various downstream tasks, paving the way for real-world applications in embodied agents and augmented reality systems.

Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting

23 Aug 2024 | Jun Guo*, Xiaojiao Ma*, Yue Fan, Huaping Liu†,Senior Member, IEEE, Qing Li†

23 Aug 2024 | Jun Guo, Xiaojiao Ma, Yue Fan, Huaping Liu†,Senior Member, IEEE, Qing Li†