Gaga: Group Any Gaussians via 3D-aware Memory Bank

Gaga: Group Any Gaussians via 3D-aware Memory Bank

27 Mar 2025 | Weijie Lyu, Xueting Li, Abhijit Kundu, Yi-Hsuan Tsai, Ming-Hsuan Yang
Gaga is a framework that reconstructs and segments open-world 3D scenes using inconsistent 2D masks predicted by zero-shot class-agnostic segmentation models. The key idea is to group any 3D Gaussians in an open-world 3D scene and render multi-view consistent segmentation. By employing a 3D-aware memory bank, Gaga eliminates the label inconsistency that exists in 2D segmentation predicted by foundational models and assigns each mask across different views a universal group ID. This enables the process of lifting 2D segmentation to a consistent 3D segmentation. Gaga produces accurate 3D object segmentation, achieving high-quality results for downstream applications such as scene manipulation. The framework leverages Gaussian Splatting to reconstruct the 3D scene and extract 2D masks using an open-world segmentation method. Subsequently, it iteratively builds a 3D-aware memory bank that collects and stores the Gaussians grouped by category. For each input view, it projects each 2D mask into 3D space using camera parameters and searches the memory bank for the category with the largest overlapping with the unprojected mask. Based on the degree of overlapping, it either assigns the mask to an existing category or creates a new one. Finally, it learns a feature vector for each 3D Gaussian that encodes its category information. The predicted masks are then compared with the segmentation masks obtained from the 3D-aware memory bank for supervision. Gaga outperforms previous methods in terms of segmentation accuracy, multi-view consistency, and reduced artifacts. It is particularly effective in sparse view settings and demonstrates robust performance with different open-world zero-shot class-agnostic segmentation models. Comprehensive experiments on diverse datasets and challenging scenarios, including sparse input views, demonstrate the effectiveness of the proposed method both qualitatively and quantitatively. The framework is also effective for scene manipulation tasks, such as changing the color of objects and removing them. The results show that Gaga can accurately identify and manipulate 3D objects in various scenarios.Gaga is a framework that reconstructs and segments open-world 3D scenes using inconsistent 2D masks predicted by zero-shot class-agnostic segmentation models. The key idea is to group any 3D Gaussians in an open-world 3D scene and render multi-view consistent segmentation. By employing a 3D-aware memory bank, Gaga eliminates the label inconsistency that exists in 2D segmentation predicted by foundational models and assigns each mask across different views a universal group ID. This enables the process of lifting 2D segmentation to a consistent 3D segmentation. Gaga produces accurate 3D object segmentation, achieving high-quality results for downstream applications such as scene manipulation. The framework leverages Gaussian Splatting to reconstruct the 3D scene and extract 2D masks using an open-world segmentation method. Subsequently, it iteratively builds a 3D-aware memory bank that collects and stores the Gaussians grouped by category. For each input view, it projects each 2D mask into 3D space using camera parameters and searches the memory bank for the category with the largest overlapping with the unprojected mask. Based on the degree of overlapping, it either assigns the mask to an existing category or creates a new one. Finally, it learns a feature vector for each 3D Gaussian that encodes its category information. The predicted masks are then compared with the segmentation masks obtained from the 3D-aware memory bank for supervision. Gaga outperforms previous methods in terms of segmentation accuracy, multi-view consistency, and reduced artifacts. It is particularly effective in sparse view settings and demonstrates robust performance with different open-world zero-shot class-agnostic segmentation models. Comprehensive experiments on diverse datasets and challenging scenarios, including sparse input views, demonstrate the effectiveness of the proposed method both qualitatively and quantitatively. The framework is also effective for scene manipulation tasks, such as changing the color of objects and removing them. The results show that Gaga can accurately identify and manipulate 3D objects in various scenarios.
Reach us at info@futurestudyspace.com