PoCo: Policy Composition from and for Heterogeneous Robot Learning

PoCo: Policy Composition from and for Heterogeneous Robot Learning

27 May 2024 | Lirui Wang, Jialiang Zhao, Yilun Du, Edward H. Adelson, Russ Tedrake
PoCo is a framework for policy composition in heterogeneous robot learning, combining information across diverse modalities and domains to learn generalized manipulation skills. The method uses diffusion models to compose different data distributions, enabling task-level and domain-level policy composition. Policies are trained on simulation, human, and real-world data, and evaluated in tool-use tasks. The composed policies achieve robust and dexterous performance across varying scenes and tasks, outperforming baselines from single data sources and simple data pooling methods in both simulation and real-world experiments. The framework allows for flexible policy composition by combining information across behaviors, tasks, modalities, and domains. It enables modular combination of policies at prediction time without retraining, allowing adaptation to new tasks and settings. The method leverages probabilistic composition to combine information from different domains and modalities, enabling robust generalization to novel scenes and tasks. PoCo is compared to other approaches in robotic learning, including representation learning and large-scale data-pooling methods. It does not require extensive data engineering or data sharing, as policies are learned modularly on separate domains. The framework also enables quick adaptation to out-of-distribution settings with additional data or tasks. In simulation testing on the Fleet-Tools benchmark, PoCo improves diffusion policies in multi-task settings with various tools. In the real world, policies are composed across different modalities and domains, enabling robust performance across changes in manipulands, distractor objects, camera viewpoints, and angles. The overall compositions provide success rate improvements of 20% in both simulation and the real world compared to baselines. The contributions of PoCo include: (a) a policy composition framework using probabilistic composition of diffusion models to combine information from different domains and modalities; (b) task-level, behavior-level, and domain-level prediction-time compositions for constructing complex composite policies without retraining; (c) scene-level and task-level generalization across simulation and real-world settings for four different tool-use tasks, demonstrating robust and dexterous behaviors. The paper also discusses related works, including diffusion models, compositional models in robotics, multi-task and multi-domain imitation learning, and robotic tool-use. It presents the implementation details of the framework, including the use of diffusion models for trajectory generation and the application of policy composition in different settings. The results show that PoCo outperforms baselines in both simulation and real-world experiments, demonstrating its effectiveness in handling data heterogeneity and task diversity in robotic learning.PoCo is a framework for policy composition in heterogeneous robot learning, combining information across diverse modalities and domains to learn generalized manipulation skills. The method uses diffusion models to compose different data distributions, enabling task-level and domain-level policy composition. Policies are trained on simulation, human, and real-world data, and evaluated in tool-use tasks. The composed policies achieve robust and dexterous performance across varying scenes and tasks, outperforming baselines from single data sources and simple data pooling methods in both simulation and real-world experiments. The framework allows for flexible policy composition by combining information across behaviors, tasks, modalities, and domains. It enables modular combination of policies at prediction time without retraining, allowing adaptation to new tasks and settings. The method leverages probabilistic composition to combine information from different domains and modalities, enabling robust generalization to novel scenes and tasks. PoCo is compared to other approaches in robotic learning, including representation learning and large-scale data-pooling methods. It does not require extensive data engineering or data sharing, as policies are learned modularly on separate domains. The framework also enables quick adaptation to out-of-distribution settings with additional data or tasks. In simulation testing on the Fleet-Tools benchmark, PoCo improves diffusion policies in multi-task settings with various tools. In the real world, policies are composed across different modalities and domains, enabling robust performance across changes in manipulands, distractor objects, camera viewpoints, and angles. The overall compositions provide success rate improvements of 20% in both simulation and the real world compared to baselines. The contributions of PoCo include: (a) a policy composition framework using probabilistic composition of diffusion models to combine information from different domains and modalities; (b) task-level, behavior-level, and domain-level prediction-time compositions for constructing complex composite policies without retraining; (c) scene-level and task-level generalization across simulation and real-world settings for four different tool-use tasks, demonstrating robust and dexterous behaviors. The paper also discusses related works, including diffusion models, compositional models in robotics, multi-task and multi-domain imitation learning, and robotic tool-use. It presents the implementation details of the framework, including the use of diffusion models for trajectory generation and the application of policy composition in different settings. The results show that PoCo outperforms baselines in both simulation and real-world experiments, demonstrating its effectiveness in handling data heterogeneity and task diversity in robotic learning.
Reach us at info@study.space
[slides and audio] PoCo%3A Policy Composition from and for Heterogeneous Robot Learning