27 May 2024 | Lirui Wang, Jialiang Zhao, Yilun Du, Edward H. Adelson, Russ Tedrake
The paper "PoCo: Policy Composition from and for Heterogeneous Robot Learning" addresses the challenge of training general robotic policies from diverse and heterogeneous data, including different modalities (color, depth, tactile, proprioceptive) and domains (simulation, real robots, human videos). The authors propose a flexible approach called Policy Composition (PoCo) to combine information across these diverse modalities and domains for learning scene-level and task-level generalized manipulation skills. PoCo uses diffusion models to probabilistically compose different data distributions, allowing for modular adaptation at inference time. The method is evaluated on tool-use tasks in both simulation and real-world experiments, demonstrating robust and dexterous performance under varying scenes and tasks. PoCo outperforms baselines that use single data sources or simple pooling of heterogeneous data, achieving a 20% success rate improvement in both simulation and real-world settings. The contributions of the paper include the introduction of PoCo, the development of task-level, behavior-level, and domain-level prediction-time compositions, and the demonstration of scene-level and task-level generalization across different tool-use tasks.The paper "PoCo: Policy Composition from and for Heterogeneous Robot Learning" addresses the challenge of training general robotic policies from diverse and heterogeneous data, including different modalities (color, depth, tactile, proprioceptive) and domains (simulation, real robots, human videos). The authors propose a flexible approach called Policy Composition (PoCo) to combine information across these diverse modalities and domains for learning scene-level and task-level generalized manipulation skills. PoCo uses diffusion models to probabilistically compose different data distributions, allowing for modular adaptation at inference time. The method is evaluated on tool-use tasks in both simulation and real-world experiments, demonstrating robust and dexterous performance under varying scenes and tasks. PoCo outperforms baselines that use single data sources or simple pooling of heterogeneous data, achieving a 20% success rate improvement in both simulation and real-world settings. The contributions of the paper include the introduction of PoCo, the development of task-level, behavior-level, and domain-level prediction-time compositions, and the demonstration of scene-level and task-level generalization across different tool-use tasks.