Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations

Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations

26 Apr 2024 | Puhao Li, Tengyu Liu, Yuyang Li, Muzhi Han, Haoran Geng, Shu Wang, Yixin Zhu, Song-Chun Zhu, Siyuan Huang
Ag2Manip is a novel framework designed to enable robots to learn novel manipulation skills without relying on domain-specific demonstrations. The framework addresses the challenges of bridging the domain gap between humans and robots and improving the precision of robotic manipulations. Key innovations include an agent-agnostic visual representation that removes human-specific details from videos, enhancing generalizability, and an agent-agnostic action representation that abstracts robot actions into a universal proxy, focusing on crucial interactions between the end-effector and the object. Empirical validation across simulated benchmarks like FrancaKitchen, ManiSkill, and PartManip shows a 325% increase in performance, achieving a 78.7% success rate. Ablation studies highlight the importance of both visual and action representations. In real-world experiments, Ag2Manip significantly improves imitation learning success rates from 50% to 77.5%, demonstrating its effectiveness and generalizability across simulated and physical environments.Ag2Manip is a novel framework designed to enable robots to learn novel manipulation skills without relying on domain-specific demonstrations. The framework addresses the challenges of bridging the domain gap between humans and robots and improving the precision of robotic manipulations. Key innovations include an agent-agnostic visual representation that removes human-specific details from videos, enhancing generalizability, and an agent-agnostic action representation that abstracts robot actions into a universal proxy, focusing on crucial interactions between the end-effector and the object. Empirical validation across simulated benchmarks like FrancaKitchen, ManiSkill, and PartManip shows a 325% increase in performance, achieving a 78.7% success rate. Ablation studies highlight the importance of both visual and action representations. In real-world experiments, Ag2Manip significantly improves imitation learning success rates from 50% to 77.5%, demonstrating its effectiveness and generalizability across simulated and physical environments.
Reach us at info@study.space
[slides] Ag2Manip%3A Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations | StudySpace