26 Nov 2024 | Ri-Zhao Qiu, Yafei Hu, Yuchen Song, Ge Yang, Yang Fu, Jianglong Ye, Jiteng Mu, Ruihan Yang, Nikolay Atanasov, Sebastian Scherer, Xiaolong Wang
GeFF (Generalizable Feature Fields) is a scene-level neural feature field that provides a unified representation for robot navigation and manipulation in real-time. It is trained using neural rendering techniques similar to Neural Radiance Fields (NeRFs) and incorporates feature distillation from vision-language models like CLIP to enable language-conditioned semantics. GeFF is designed to handle open-vocabulary object and part-level manipulation, semantic-aware navigation, and zero-shot grasping tasks. It uses a single feed-forward pass during inference, making it efficient for real-time applications. GeFF is trained on the ScanNet dataset and is deployed on a quadrupedal robot with a manipulator. It outperforms existing point-based and implicit methods in open-vocabulary scene representation for mobile manipulation, achieving higher success rates in object-level and part-level tasks. GeFF also supports articulated manipulation and provides real-time geometric and semantic information for planning. It is capable of handling dynamic environments and supports diverse tasks such as dynamic obstacle avoidance and semantic-aware navigation. The method is evaluated on various real-world environments and demonstrates superior performance compared to existing approaches in terms of efficiency and accuracy. GeFF is a promising solution for mobile manipulation tasks requiring real-time perception and action.GeFF (Generalizable Feature Fields) is a scene-level neural feature field that provides a unified representation for robot navigation and manipulation in real-time. It is trained using neural rendering techniques similar to Neural Radiance Fields (NeRFs) and incorporates feature distillation from vision-language models like CLIP to enable language-conditioned semantics. GeFF is designed to handle open-vocabulary object and part-level manipulation, semantic-aware navigation, and zero-shot grasping tasks. It uses a single feed-forward pass during inference, making it efficient for real-time applications. GeFF is trained on the ScanNet dataset and is deployed on a quadrupedal robot with a manipulator. It outperforms existing point-based and implicit methods in open-vocabulary scene representation for mobile manipulation, achieving higher success rates in object-level and part-level tasks. GeFF also supports articulated manipulation and provides real-time geometric and semantic information for planning. It is capable of handling dynamic environments and supports diverse tasks such as dynamic obstacle avoidance and semantic-aware navigation. The method is evaluated on various real-world environments and demonstrates superior performance compared to existing approaches in terms of efficiency and accuracy. GeFF is a promising solution for mobile manipulation tasks requiring real-time perception and action.