17 Jan 2024 | Chung Min Kim, Mingxuan Wu, Justin Kerr, Ken Goldberg, Matthew Tancik, Angjoo Kanazawa
GARField is a method that uses radiance fields to group objects in 3D scenes based on physical scale. It takes posed images as input and produces a hierarchical 3D grouping of the scene along with a scale-conditioned affinity field. The method uses a scale-conditioned feature field to resolve conflicts between 2D masks from different viewpoints, enabling the creation of a coherent 3D grouping. The affinity field is trained using a contrastive loss that encourages transitivity and containment, ensuring that groups are consistent across different scales. GARField can extract 3D assets from the hierarchy by automatically or manually selecting groups, and it can be used for tasks such as 3D asset extraction and interactive segmentation. The method is evaluated on a variety of real scenes and shows effective grouping at multiple levels, including clusters of objects, individual objects, and subparts. GARField's hierarchical grouping has potential applications in 3D asset extraction and dynamic scene understanding. The method is implemented in Nerfstudio and uses a hashgrid and MLP to represent the grouping field. The method is trained using a contrastive loss and scale-conditioned supervision to ensure consistency across different scales. GARField is able to produce detailed groupings that are view-consistent, unlike 2D baselines. The method is evaluated on a variety of scenes and shows high-quality groupings across different scales. The method is able to handle ambiguous groupings by using physical scale as a cue to consolidate groups into a hierarchy. The method is able to extract 3D assets from the hierarchy by automatically or manually selecting groups, and it can be used for tasks such as 3D asset extraction and interactive segmentation. The method is able to produce detailed groupings that are view-consistent, unlike 2D baselines. The method is evaluated on a variety of scenes and shows high-quality groupings across different scales. The method is able to handle ambiguous groupings by using physical scale as a cue to consolidate groups into a hierarchy.GARField is a method that uses radiance fields to group objects in 3D scenes based on physical scale. It takes posed images as input and produces a hierarchical 3D grouping of the scene along with a scale-conditioned affinity field. The method uses a scale-conditioned feature field to resolve conflicts between 2D masks from different viewpoints, enabling the creation of a coherent 3D grouping. The affinity field is trained using a contrastive loss that encourages transitivity and containment, ensuring that groups are consistent across different scales. GARField can extract 3D assets from the hierarchy by automatically or manually selecting groups, and it can be used for tasks such as 3D asset extraction and interactive segmentation. The method is evaluated on a variety of real scenes and shows effective grouping at multiple levels, including clusters of objects, individual objects, and subparts. GARField's hierarchical grouping has potential applications in 3D asset extraction and dynamic scene understanding. The method is implemented in Nerfstudio and uses a hashgrid and MLP to represent the grouping field. The method is trained using a contrastive loss and scale-conditioned supervision to ensure consistency across different scales. GARField is able to produce detailed groupings that are view-consistent, unlike 2D baselines. The method is evaluated on a variety of scenes and shows high-quality groupings across different scales. The method is able to handle ambiguous groupings by using physical scale as a cue to consolidate groups into a hierarchy. The method is able to extract 3D assets from the hierarchy by automatically or manually selecting groups, and it can be used for tasks such as 3D asset extraction and interactive segmentation. The method is able to produce detailed groupings that are view-consistent, unlike 2D baselines. The method is evaluated on a variety of scenes and shows high-quality groupings across different scales. The method is able to handle ambiguous groupings by using physical scale as a cue to consolidate groups into a hierarchy.