Contrastive Gaussian Clustering: Weakly Supervised 3D Scene Segmentation

Contrastive Gaussian Clustering: Weakly Supervised 3D Scene Segmentation

19 Apr 2024 | Myrna C. Silva*, Mahtab Dahaghin*, Matteo Toso, and Alessio Del Bue
Contrastive Gaussian Clustering is a novel approach for 3D scene segmentation that enables the generation of segmentation masks from any viewpoint. The method leverages 3D Gaussian Splatting to model the appearance of a scene using a cloud of 3D Gaussians. Each Gaussian is associated with a segmentation feature vector, which allows for 3D scene segmentation by clustering Gaussians based on their feature vectors and generating 2D segmentation masks by projecting Gaussians onto a plane and blending their segmentation features. The method combines contrastive learning and spatial regularization to train on inconsistent 2D segmentation masks while learning to generate consistent segmentation masks across all views. The resulting model is highly accurate, improving the IoU accuracy of the predicted masks by +8% over the state of the art. The method is tested against related works based on implicit scene representations and 3D Gaussian representations, showing how it matches and outperforms them. The model is trained using a combination of rendering loss, contrastive clustering loss, and spatial-similarity regularization. The method is evaluated on two datasets, LERF-Mask and 3D-OVS, showing significant improvements in segmentation accuracy compared to existing methods. The model is able to generate accurate instance segmentation masks for any object in the scene and provides high-quality 2D segmentation masks. The method is also effective in semantic segmentation, generating accurate segmentation masks for arbitrary views. The model is able to handle dynamic scenes and can be used for editing scenes from text prompts. The method is efficient and accurate, outperforming current approaches based on both NeRF and 3DGS.Contrastive Gaussian Clustering is a novel approach for 3D scene segmentation that enables the generation of segmentation masks from any viewpoint. The method leverages 3D Gaussian Splatting to model the appearance of a scene using a cloud of 3D Gaussians. Each Gaussian is associated with a segmentation feature vector, which allows for 3D scene segmentation by clustering Gaussians based on their feature vectors and generating 2D segmentation masks by projecting Gaussians onto a plane and blending their segmentation features. The method combines contrastive learning and spatial regularization to train on inconsistent 2D segmentation masks while learning to generate consistent segmentation masks across all views. The resulting model is highly accurate, improving the IoU accuracy of the predicted masks by +8% over the state of the art. The method is tested against related works based on implicit scene representations and 3D Gaussian representations, showing how it matches and outperforms them. The model is trained using a combination of rendering loss, contrastive clustering loss, and spatial-similarity regularization. The method is evaluated on two datasets, LERF-Mask and 3D-OVS, showing significant improvements in segmentation accuracy compared to existing methods. The model is able to generate accurate instance segmentation masks for any object in the scene and provides high-quality 2D segmentation masks. The method is also effective in semantic segmentation, generating accurate segmentation masks for arbitrary views. The model is able to handle dynamic scenes and can be used for editing scenes from text prompts. The method is efficient and accurate, outperforming current approaches based on both NeRF and 3DGS.
Reach us at info@study.space
Understanding Contrastive Gaussian Clustering%3A Weakly Supervised 3D Scene Segmentation