[slides] Retrieval-Augmented Embodied Agents

Retrieval-Augmented Embodied Agents (RAEA) are introduced to enhance robotic agents' ability to perform complex tasks by leveraging external policy memory banks. This system enables robots to access and utilize previously learned strategies, improving their adaptability and effectiveness in various environments. RAEA integrates a policy retriever, which identifies relevant strategies from a memory bank based on multi-modal inputs, and a policy generator, which incorporates these strategies into the learning process to produce effective responses. The system is tested in both simulated and real-world scenarios, demonstrating superior performance compared to traditional methods. The framework utilizes a large-scale dataset, Open X-Embodiment, which contains diverse robotic data across multiple embodiments, enriching the knowledge base of RAEA. The policy retriever processes various input modalities, including text, audio, images, and point clouds, while the policy generator employs cross-attention mechanisms to integrate retrieved policies into the main policy network. Extensive experiments on simulation benchmarks and real-world environments show that RAEA significantly improves generalization ability, particularly in low-data scenarios. The system's ability to access external knowledge enhances its performance, making it a valuable advancement in robotic technology. The contributions include the development of RAEA, a policy retriever capable of handling multi-modal inputs, and a policy generator that improves the model's ability to generalize across various situations. The results from real-world experiments and simulations validate the effectiveness of the approach, demonstrating its practicality and versatility in robotic applications.Retrieval-Augmented Embodied Agents (RAEA) are introduced to enhance robotic agents' ability to perform complex tasks by leveraging external policy memory banks. This system enables robots to access and utilize previously learned strategies, improving their adaptability and effectiveness in various environments. RAEA integrates a policy retriever, which identifies relevant strategies from a memory bank based on multi-modal inputs, and a policy generator, which incorporates these strategies into the learning process to produce effective responses. The system is tested in both simulated and real-world scenarios, demonstrating superior performance compared to traditional methods. The framework utilizes a large-scale dataset, Open X-Embodiment, which contains diverse robotic data across multiple embodiments, enriching the knowledge base of RAEA. The policy retriever processes various input modalities, including text, audio, images, and point clouds, while the policy generator employs cross-attention mechanisms to integrate retrieved policies into the main policy network. Extensive experiments on simulation benchmarks and real-world environments show that RAEA significantly improves generalization ability, particularly in low-data scenarios. The system's ability to access external knowledge enhances its performance, making it a valuable advancement in robotic technology. The contributions include the development of RAEA, a policy retriever capable of handling multi-modal inputs, and a policy generator that improves the model's ability to generalize across various situations. The results from real-world experiments and simulations validate the effectiveness of the approach, demonstrating its practicality and versatility in robotic applications.

Retrieval-Augmented Embodied Agents

17 Apr 2024 | Yichen Zhu, Zhicai Ou, Xiaofeng Mou, Jian Tang*