GARField: Group Anything with Radiance Fields

GARField: Group Anything with Radiance Fields

17 Jan 2024 | Chung Min Kim*1 Mingxuan Wu*1 Justin Kerr*1 Ken Goldberg1 Matthew Tancik2 Angjoo Kanazawa1
**GARField: Group Anything with Radiance Fields** **Authors:** Chung Min Kim, Mingxuan Wu, Justin Kerr, Ken Goldberg, Matthew Tancik, Angjoo Kanazawa **Abstract:** GARField is a method for decomposing 3D scenes into a hierarchy of semantically meaningful groups from posed image inputs. It addresses the ambiguity in grouping by leveraging physical scale, optimizing a scale-conditioned 3D affinity feature field. This field is trained from 2D masks provided by Segment Anything (SAM) to respect coarse-to-fine hierarchy, resolving conflicting masks from different viewpoints. The resulting affinity field can be used to derive a hierarchy of possible groupings via automatic tree construction or user interaction. GARField is evaluated on various in-the-wild scenes and shown to effectively extract groups at multiple levels, including clusters of objects, objects, and subparts. It produces higher-fidelity groups than input SAM masks and has potential applications in 3D asset extraction and dynamic scene understanding. **Key Contributions:** 1. **Scale-Conditioned Affinity Field:** GARField optimizes a dense 3D feature field that reflects points' affinity, allowing for consistent grouping across different scales. 2. **Multi-Level Grouping:** It can extract groups at various granularities, from clusters of objects to individual parts. 3. **View Consistency:** The method ensures that the recovered groups are consistent across different viewpoints. 4. **Hierarchical Decomposition:** It generates a hierarchical tree of scene nodes, allowing for recursive clustering at decreasing scales. **Methods:** 1. **2D Mask Generation:** Preprocess input images with SAM to obtain mask candidates and assign physical scales based on scene geometry. 2. **Scale-Conditioned Affinity Field:** Optimizes a 3D feature field to resolve conflicting 2D masks, using a contrastive loss and a containment auxiliary loss. 3. **Hierarchical Decomposition:** Recursively clusters groups at decreasing scales using HDBSCAN, ensuring coarse-to-fine grouping. **Experiments:** - **Qualitative Scene Decomposition:** Visualizes hierarchical clustering results using Gaussian Splats. - **Quantitative Evaluation:** Measures view consistency and hierarchical grouping recall against ground truth annotations. **Limitations:** - Group ambiguity can lead to multiple valid groupings within a single scale. - Scale-conditioning may result in separate branches for object parts of different sizes. - Tree generation is a greedy algorithm, leading to potential spurious small groups. **Conclusion:** GARField provides a robust method for hierarchical 3D scene decomposition, with applications in robotics, dynamic scene reconstruction, and scene editing.**GARField: Group Anything with Radiance Fields** **Authors:** Chung Min Kim, Mingxuan Wu, Justin Kerr, Ken Goldberg, Matthew Tancik, Angjoo Kanazawa **Abstract:** GARField is a method for decomposing 3D scenes into a hierarchy of semantically meaningful groups from posed image inputs. It addresses the ambiguity in grouping by leveraging physical scale, optimizing a scale-conditioned 3D affinity feature field. This field is trained from 2D masks provided by Segment Anything (SAM) to respect coarse-to-fine hierarchy, resolving conflicting masks from different viewpoints. The resulting affinity field can be used to derive a hierarchy of possible groupings via automatic tree construction or user interaction. GARField is evaluated on various in-the-wild scenes and shown to effectively extract groups at multiple levels, including clusters of objects, objects, and subparts. It produces higher-fidelity groups than input SAM masks and has potential applications in 3D asset extraction and dynamic scene understanding. **Key Contributions:** 1. **Scale-Conditioned Affinity Field:** GARField optimizes a dense 3D feature field that reflects points' affinity, allowing for consistent grouping across different scales. 2. **Multi-Level Grouping:** It can extract groups at various granularities, from clusters of objects to individual parts. 3. **View Consistency:** The method ensures that the recovered groups are consistent across different viewpoints. 4. **Hierarchical Decomposition:** It generates a hierarchical tree of scene nodes, allowing for recursive clustering at decreasing scales. **Methods:** 1. **2D Mask Generation:** Preprocess input images with SAM to obtain mask candidates and assign physical scales based on scene geometry. 2. **Scale-Conditioned Affinity Field:** Optimizes a 3D feature field to resolve conflicting 2D masks, using a contrastive loss and a containment auxiliary loss. 3. **Hierarchical Decomposition:** Recursively clusters groups at decreasing scales using HDBSCAN, ensuring coarse-to-fine grouping. **Experiments:** - **Qualitative Scene Decomposition:** Visualizes hierarchical clustering results using Gaussian Splats. - **Quantitative Evaluation:** Measures view consistency and hierarchical grouping recall against ground truth annotations. **Limitations:** - Group ambiguity can lead to multiple valid groupings within a single scale. - Scale-conditioning may result in separate branches for object parts of different sizes. - Tree generation is a greedy algorithm, leading to potential spurious small groups. **Conclusion:** GARField provides a robust method for hierarchical 3D scene decomposition, with applications in robotics, dynamic scene reconstruction, and scene editing.
Reach us at info@study.space
[slides and audio] GARField%3A Group Anything with Radiance Fields