Understanding OpenGaussian%3A Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding

This paper introduces OpenGaussian, a method based on 3D Gaussian Splatting (3DGS) that enables point-level open vocabulary understanding in 3D scenes. The primary motivation is to address the limitations of existing 3DGS-based methods, which primarily focus on 2D pixel-level parsing and struggle with 3D point-level tasks due to weak feature expressiveness and inaccurate 2D-3D feature associations. OpenGaussian addresses these issues by training instance features with 3D consistency using SAM masks and proposing a two-level codebook to discretize these features, ensuring both intra-object consistency and inter-object distinction. Additionally, it introduces an instance-level 2D-3D feature association method that links 3D points to 2D CLIP features, enhancing open-vocabulary 3D scene understanding. Extensive experiments demonstrate the effectiveness of OpenGaussian in various tasks, including open-vocabulary object selection, 3D point cloud understanding, and click-based 3D object selection. The method eliminates the need for additional networks for feature dimensionality compression or quantization while inheriting the open-vocabulary capabilities of CLIP features.This paper introduces OpenGaussian, a method based on 3D Gaussian Splatting (3DGS) that enables point-level open vocabulary understanding in 3D scenes. The primary motivation is to address the limitations of existing 3DGS-based methods, which primarily focus on 2D pixel-level parsing and struggle with 3D point-level tasks due to weak feature expressiveness and inaccurate 2D-3D feature associations. OpenGaussian addresses these issues by training instance features with 3D consistency using SAM masks and proposing a two-level codebook to discretize these features, ensuring both intra-object consistency and inter-object distinction. Additionally, it introduces an instance-level 2D-3D feature association method that links 3D points to 2D CLIP features, enhancing open-vocabulary 3D scene understanding. Extensive experiments demonstrate the effectiveness of OpenGaussian in various tasks, including open-vocabulary object selection, 3D point cloud understanding, and click-based 3D object selection. The method eliminates the need for additional networks for feature dimensionality compression or quantization while inheriting the open-vocabulary capabilities of CLIP features.

OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding

2024 | Yanmin Wu, Jiarui Meng, Haijie Li, Chenming Wu, Yahao Shi, Xinhua Cheng, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang, Jian Zhang