6 Jun 2024 | Haoyi Zhu, Yating Wang, Di Huang, Weicai Ye, Wanli Ouyang, Tong He
This paper investigates the impact of different observation spaces on robot learning, focusing on RGB, RGB-D, and point cloud modalities. The authors introduce OBSBench, a benchmark comprising two simulators and 125 tasks, along with standardized pipelines for various encoders and policy baselines. They conduct extensive experiments on diverse contact-rich manipulation tasks, revealing that point cloud-based methods consistently outperform RGB and RGB-D counterparts in terms of success rate and robustness. Point cloud observations show better generalization across various geometric and visual conditions. The study also highlights the importance of incorporating both color and coordinate information in point cloud methods. The findings suggest that 3D point clouds are a valuable observation modality for complex robotic tasks. The paper also explores the impact of pre-trained visual representations (PVRs) and the zero-shot generalization capabilities of different observation spaces. The results indicate that point cloud methods are more robust to variations in camera views and visual changes, and that PVRs can improve model generalization. The study also examines the sample efficiency of different observation spaces and the impact of design choices on point cloud performance. Overall, the paper provides valuable insights into the design of more generalizable and robust robotic models.This paper investigates the impact of different observation spaces on robot learning, focusing on RGB, RGB-D, and point cloud modalities. The authors introduce OBSBench, a benchmark comprising two simulators and 125 tasks, along with standardized pipelines for various encoders and policy baselines. They conduct extensive experiments on diverse contact-rich manipulation tasks, revealing that point cloud-based methods consistently outperform RGB and RGB-D counterparts in terms of success rate and robustness. Point cloud observations show better generalization across various geometric and visual conditions. The study also highlights the importance of incorporating both color and coordinate information in point cloud methods. The findings suggest that 3D point clouds are a valuable observation modality for complex robotic tasks. The paper also explores the impact of pre-trained visual representations (PVRs) and the zero-shot generalization capabilities of different observation spaces. The results indicate that point cloud methods are more robust to variations in camera views and visual changes, and that PVRs can improve model generalization. The study also examines the sample efficiency of different observation spaces and the impact of design choices on point cloud performance. Overall, the paper provides valuable insights into the design of more generalizable and robust robotic models.