THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation

THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation

28 May 2024 | Wilbert Pumacay*, Ishika Singh*, Jiafei Duan*, Ranjay Krishna, Jesse Thomason, Dieter Fox
The paper introduces THE COLOSSEUM, a comprehensive simulation benchmark designed to evaluate the generalization capabilities of robotic manipulation models. The benchmark includes 20 diverse manipulation tasks and 14 axes of environmental perturbations, such as changes in object color, texture, size, lighting, distractors, physical properties, and camera pose. The authors evaluate five state-of-the-art manipulation models using THE COLOSSEUM and find that their success rates degrade by 30-50% across these perturbations, with a 75% degradation when multiple perturbations are applied simultaneously. They identify that changes in distractor objects, target object color, and lighting conditions have the most significant impact on model performance. The results are validated in real-world experiments, showing a strong correlation (R² = 0.614) between simulation and real-world performance. The paper also proposes the THE COLOSSEUM Challenge to encourage further research and development of generalizable robotic manipulation models. The benchmark and related resources are open-sourced to facilitate reproducibility and future research.The paper introduces THE COLOSSEUM, a comprehensive simulation benchmark designed to evaluate the generalization capabilities of robotic manipulation models. The benchmark includes 20 diverse manipulation tasks and 14 axes of environmental perturbations, such as changes in object color, texture, size, lighting, distractors, physical properties, and camera pose. The authors evaluate five state-of-the-art manipulation models using THE COLOSSEUM and find that their success rates degrade by 30-50% across these perturbations, with a 75% degradation when multiple perturbations are applied simultaneously. They identify that changes in distractor objects, target object color, and lighting conditions have the most significant impact on model performance. The results are validated in real-world experiments, showing a strong correlation (R² = 0.614) between simulation and real-world performance. The paper also proposes the THE COLOSSEUM Challenge to encourage further research and development of generalizable robotic manipulation models. The benchmark and related resources are open-sourced to facilitate reproducibility and future research.
Reach us at info@study.space
[slides] THE COLOSSEUM%3A A Benchmark for Evaluating Generalization for Robotic Manipulation | StudySpace