OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding

OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding

2024 | Yanmin Wu, Jiarui Meng, Haijie Li, Chenming Wu, Yahao Shi, Xinhua Cheng, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang, Jian Zhang
OpenGaussian is a method based on 3D Gaussian Splatting (3DGS) that enables 3D point-level open vocabulary understanding. The paper introduces OpenGaussian, which addresses the limitations of existing 3DGS-based open vocabulary methods that focus on 2D pixel-level parsing. These methods struggle with 3D point-level tasks due to weak feature expressiveness and inaccurate 2D-3D feature associations. To overcome these challenges, OpenGaussian first employs SAM masks without cross-frame associations to train instance features with 3D consistency. These features exhibit both intra-object consistency and inter-object distinction. Then, a two-stage codebook is proposed to discretize these features from coarse to fine levels. At the coarse level, positional information of 3D points is used for location-based clustering, which is refined at the fine level. Finally, an instance-level 3D-2D feature association method is introduced that links 3D points to 2D masks, which are further associated with 2D CLIP features. Extensive experiments demonstrate the effectiveness of OpenGaussian in open vocabulary-based 3D object selection, 3D point cloud understanding, click-based 3D object selection, and ablation studies. The source code is available at the project page. OpenGaussian eliminates the need for an additional network for feature dimensionality compression or quantization while inheriting the open-vocabulary capabilities of the original CLIP features. The method achieves superior performance by addressing the two issues faced by comparison methods: 1) obtaining distinctive features through semantic-agnostic feature learning and two-level codebook discretization; 2) avoiding the learning burden of high-dimensional CLIP features and ensuring lossless features through training-free instance-level 2D-3D feature association. The results show that OpenGaussian outperforms existing methods in 3D object selection, semantic segmentation, and point cloud understanding. The method is efficient and effective, with the ability to handle 3D point-level open vocabulary understanding.OpenGaussian is a method based on 3D Gaussian Splatting (3DGS) that enables 3D point-level open vocabulary understanding. The paper introduces OpenGaussian, which addresses the limitations of existing 3DGS-based open vocabulary methods that focus on 2D pixel-level parsing. These methods struggle with 3D point-level tasks due to weak feature expressiveness and inaccurate 2D-3D feature associations. To overcome these challenges, OpenGaussian first employs SAM masks without cross-frame associations to train instance features with 3D consistency. These features exhibit both intra-object consistency and inter-object distinction. Then, a two-stage codebook is proposed to discretize these features from coarse to fine levels. At the coarse level, positional information of 3D points is used for location-based clustering, which is refined at the fine level. Finally, an instance-level 3D-2D feature association method is introduced that links 3D points to 2D masks, which are further associated with 2D CLIP features. Extensive experiments demonstrate the effectiveness of OpenGaussian in open vocabulary-based 3D object selection, 3D point cloud understanding, click-based 3D object selection, and ablation studies. The source code is available at the project page. OpenGaussian eliminates the need for an additional network for feature dimensionality compression or quantization while inheriting the open-vocabulary capabilities of the original CLIP features. The method achieves superior performance by addressing the two issues faced by comparison methods: 1) obtaining distinctive features through semantic-agnostic feature learning and two-level codebook discretization; 2) avoiding the learning burden of high-dimensional CLIP features and ensuring lossless features through training-free instance-level 2D-3D feature association. The results show that OpenGaussian outperforms existing methods in 3D object selection, semantic segmentation, and point cloud understanding. The method is efficient and effective, with the ability to handle 3D point-level open vocabulary understanding.
Reach us at info@study.space