RAM: Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic Manipulation

RAM: Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic Manipulation

5 Jul 2024 | Yuxuan Kuang, Junjie Ye, Haoran Geng, Jiageng Mao, Congyue Deng, Leonidas Guibas, He Wang, Yue Wang
This paper introduces RAM (Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic Manipulation), a novel framework for zero-shot robotic manipulation that generalizes across various objects, environments, and embodiments. Unlike existing methods that rely on expensive in-domain demonstrations, RAM leverages a retrieval-based affordance transfer paradigm to acquire versatile manipulation capabilities from abundant out-of-domain data. The key components of RAM include: 1. **Affordance Memory Construction**: RAM extracts unified affordance information from diverse sources such as robotic data, human-object interaction (HOI) data, and custom data to construct a comprehensive affordance memory. 2. **Hierarchical Retrieval**: Given a language instruction, RAM hierarchically retrieves the most similar demonstration from the affordance memory and transfers the 2D affordance to 3D executable affordance in a zero-shot manner. 3. **Affordance Lifting**: A sampling-based method is used to lift the 2D affordance to 3D, enabling direct execution by robotic systems using grasp generators and motion planners. Experiments in both simulation and real-world settings demonstrate that RAM consistently outperforms existing methods in various daily tasks. Additionally, RAM shows significant potential for downstream applications such as automatic data collection, one-shot visual imitation, and integration with LLMs/LLMs for long-horizon manipulation tasks. The key contributions of RAM are: - A retrieval-based affordance transfer framework for zero-shot robotic manipulation, significantly outperforming prior works. - A scalable module for extracting unified affordance information from diverse out-of-domain data. - Enabling a variety of downstream applications, including policy distillation, one-shot visual imitation, and LLM/VLM integration. The paper also discusses related works, experimental setup, baseline methods, and ablation studies, highlighting the effectiveness and versatility of RAM.This paper introduces RAM (Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic Manipulation), a novel framework for zero-shot robotic manipulation that generalizes across various objects, environments, and embodiments. Unlike existing methods that rely on expensive in-domain demonstrations, RAM leverages a retrieval-based affordance transfer paradigm to acquire versatile manipulation capabilities from abundant out-of-domain data. The key components of RAM include: 1. **Affordance Memory Construction**: RAM extracts unified affordance information from diverse sources such as robotic data, human-object interaction (HOI) data, and custom data to construct a comprehensive affordance memory. 2. **Hierarchical Retrieval**: Given a language instruction, RAM hierarchically retrieves the most similar demonstration from the affordance memory and transfers the 2D affordance to 3D executable affordance in a zero-shot manner. 3. **Affordance Lifting**: A sampling-based method is used to lift the 2D affordance to 3D, enabling direct execution by robotic systems using grasp generators and motion planners. Experiments in both simulation and real-world settings demonstrate that RAM consistently outperforms existing methods in various daily tasks. Additionally, RAM shows significant potential for downstream applications such as automatic data collection, one-shot visual imitation, and integration with LLMs/LLMs for long-horizon manipulation tasks. The key contributions of RAM are: - A retrieval-based affordance transfer framework for zero-shot robotic manipulation, significantly outperforming prior works. - A scalable module for extracting unified affordance information from diverse out-of-domain data. - Enabling a variety of downstream applications, including policy distillation, one-shot visual imitation, and LLM/VLM integration. The paper also discusses related works, experimental setup, baseline methods, and ablation studies, highlighting the effectiveness and versatility of RAM.
Reach us at info@study.space
[slides] RAM%3A Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic Manipulation | StudySpace