17 Jan 2024 | Chung Min Kim*1 Mingxuan Wu*1 Justin Kerr*1 Ken Goldberg1 Matthew Tancik2 Angjoo Kanazawa1
**GARField: Group Anything with Radiance Fields**
**Authors:** Chung Min Kim, Mingxuan Wu, Justin Kerr, Ken Goldberg, Matthew Tancik, Angjoo Kanazawa
**Abstract:**
GARField is a method for decomposing 3D scenes into a hierarchy of semantically meaningful groups from posed image inputs. It addresses the ambiguity in grouping by leveraging physical scale, optimizing a scale-conditioned 3D affinity feature field. This field is trained from 2D masks provided by Segment Anything (SAM) to respect coarse-to-fine hierarchy, resolving conflicting masks from different viewpoints. The resulting affinity field can be used to derive a hierarchy of possible groupings via automatic tree construction or user interaction. GARField is evaluated on various in-the-wild scenes and shown to effectively extract groups at multiple levels, including clusters of objects, objects, and subparts. It produces higher-fidelity groups than input SAM masks and has potential applications in 3D asset extraction and dynamic scene understanding.
**Key Contributions:**
1. **Scale-Conditioned Affinity Field:** GARField optimizes a dense 3D feature field that reflects points' affinity, allowing for consistent grouping across different scales.
2. **Multi-Level Grouping:** It can extract groups at various granularities, from clusters of objects to individual parts.
3. **View Consistency:** The method ensures that the recovered groups are consistent across different viewpoints.
4. **Hierarchical Decomposition:** It generates a hierarchical tree of scene nodes, allowing for recursive clustering at decreasing scales.
**Methods:**
1. **2D Mask Generation:** Preprocess input images with SAM to obtain mask candidates and assign physical scales based on scene geometry.
2. **Scale-Conditioned Affinity Field:** Optimizes a 3D feature field to resolve conflicting 2D masks, using a contrastive loss and a containment auxiliary loss.
3. **Hierarchical Decomposition:** Recursively clusters groups at decreasing scales using HDBSCAN, ensuring coarse-to-fine grouping.
**Experiments:**
- **Qualitative Scene Decomposition:** Visualizes hierarchical clustering results using Gaussian Splats.
- **Quantitative Evaluation:** Measures view consistency and hierarchical grouping recall against ground truth annotations.
**Limitations:**
- Group ambiguity can lead to multiple valid groupings within a single scale.
- Scale-conditioning may result in separate branches for object parts of different sizes.
- Tree generation is a greedy algorithm, leading to potential spurious small groups.
**Conclusion:**
GARField provides a robust method for hierarchical 3D scene decomposition, with applications in robotics, dynamic scene reconstruction, and scene editing.**GARField: Group Anything with Radiance Fields**
**Authors:** Chung Min Kim, Mingxuan Wu, Justin Kerr, Ken Goldberg, Matthew Tancik, Angjoo Kanazawa
**Abstract:**
GARField is a method for decomposing 3D scenes into a hierarchy of semantically meaningful groups from posed image inputs. It addresses the ambiguity in grouping by leveraging physical scale, optimizing a scale-conditioned 3D affinity feature field. This field is trained from 2D masks provided by Segment Anything (SAM) to respect coarse-to-fine hierarchy, resolving conflicting masks from different viewpoints. The resulting affinity field can be used to derive a hierarchy of possible groupings via automatic tree construction or user interaction. GARField is evaluated on various in-the-wild scenes and shown to effectively extract groups at multiple levels, including clusters of objects, objects, and subparts. It produces higher-fidelity groups than input SAM masks and has potential applications in 3D asset extraction and dynamic scene understanding.
**Key Contributions:**
1. **Scale-Conditioned Affinity Field:** GARField optimizes a dense 3D feature field that reflects points' affinity, allowing for consistent grouping across different scales.
2. **Multi-Level Grouping:** It can extract groups at various granularities, from clusters of objects to individual parts.
3. **View Consistency:** The method ensures that the recovered groups are consistent across different viewpoints.
4. **Hierarchical Decomposition:** It generates a hierarchical tree of scene nodes, allowing for recursive clustering at decreasing scales.
**Methods:**
1. **2D Mask Generation:** Preprocess input images with SAM to obtain mask candidates and assign physical scales based on scene geometry.
2. **Scale-Conditioned Affinity Field:** Optimizes a 3D feature field to resolve conflicting 2D masks, using a contrastive loss and a containment auxiliary loss.
3. **Hierarchical Decomposition:** Recursively clusters groups at decreasing scales using HDBSCAN, ensuring coarse-to-fine grouping.
**Experiments:**
- **Qualitative Scene Decomposition:** Visualizes hierarchical clustering results using Gaussian Splats.
- **Quantitative Evaluation:** Measures view consistency and hierarchical grouping recall against ground truth annotations.
**Limitations:**
- Group ambiguity can lead to multiple valid groupings within a single scale.
- Scale-conditioning may result in separate branches for object parts of different sizes.
- Tree generation is a greedy algorithm, leading to potential spurious small groups.
**Conclusion:**
GARField provides a robust method for hierarchical 3D scene decomposition, with applications in robotics, dynamic scene reconstruction, and scene editing.