26 Nov 2024 | Ri-Zhao Qiu*1, Yafei Hu*1,2, Yuchen Song*1, Ge Yang3, Yang Fu1, Jianglong Ye1, Jiteng Mu1, Ruihan Yang1, Nikolay Atanasov1, Sebastian Scherer2, Xiaolong Wang1
The paper introduces GeFF (Generalizable Feature Fields), a novel scene-level neural feature field designed to unify the representation of objects and scenes for both robot navigation and manipulation. GeFF addresses the challenge of capturing intricate geometry and fine-grained semantics in real-time, which is crucial for mobile manipulation tasks. The method leverages generative novel view synthesis as a pre-training task and aligns the resulting scene priors with natural language through CLIP feature distillation. This approach enables GeFF to perform open-vocabulary object and part-level manipulation, as well as semantics-aware navigation, on a quadrupedal robot equipped with a manipulator. The paper demonstrates the effectiveness of GeFF through quantitative and qualitative evaluations, showing superior performance in runtime and storage-accuracy trade-offs compared to existing point-based and implicit methods. GeFF's ability to handle diverse scenes and adapt to new objects makes it a powerful tool for mobile manipulation tasks.The paper introduces GeFF (Generalizable Feature Fields), a novel scene-level neural feature field designed to unify the representation of objects and scenes for both robot navigation and manipulation. GeFF addresses the challenge of capturing intricate geometry and fine-grained semantics in real-time, which is crucial for mobile manipulation tasks. The method leverages generative novel view synthesis as a pre-training task and aligns the resulting scene priors with natural language through CLIP feature distillation. This approach enables GeFF to perform open-vocabulary object and part-level manipulation, as well as semantics-aware navigation, on a quadrupedal robot equipped with a manipulator. The paper demonstrates the effectiveness of GeFF through quantitative and qualitative evaluations, showing superior performance in runtime and storage-accuracy trade-offs compared to existing point-based and implicit methods. GeFF's ability to handle diverse scenes and adapt to new objects makes it a powerful tool for mobile manipulation tasks.