OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics

OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics

29 Feb 2024 | Peiqi Liu*1 Yaswanth Orru*1 Jay Vakil2 Chris Paxton2 Nur Muhammad Mahi Shafiullah1† Lerrel Pinto†1
OK-Robot is an open-knowledge robotic system that integrates various learned models trained on publicly available data to perform pick and drop tasks in real-world environments. The system combines Vision-Language Models (VLMs) for object detection, navigation primitives for movement, and grasping primitives for manipulation. OK-Robot achieves a 58.5% success rate in 10 unseen, cluttered home environments and an 82.4% success rate in cleaner, decluttered environments. The paper highlights the importance of nuanced details when combining VLMs with robotic modules and provides insights into the challenges of open-vocabulary robotics. The authors also share their code and robot videos to encourage further research in this area. The system's performance is evaluated through experiments in real-world home environments, revealing the effectiveness of pre-trained VLMs and grasping models, as well as the critical role of combining these components in a flexible framework. The paper discusses limitations and future directions, including the need for dynamic semantic memory, improved grasp planning, better interactivity with users, and robustification of robot hardware.OK-Robot is an open-knowledge robotic system that integrates various learned models trained on publicly available data to perform pick and drop tasks in real-world environments. The system combines Vision-Language Models (VLMs) for object detection, navigation primitives for movement, and grasping primitives for manipulation. OK-Robot achieves a 58.5% success rate in 10 unseen, cluttered home environments and an 82.4% success rate in cleaner, decluttered environments. The paper highlights the importance of nuanced details when combining VLMs with robotic modules and provides insights into the challenges of open-vocabulary robotics. The authors also share their code and robot videos to encourage further research in this area. The system's performance is evaluated through experiments in real-world home environments, revealing the effectiveness of pre-trained VLMs and grasping models, as well as the critical role of combining these components in a flexible framework. The paper discusses limitations and future directions, including the need for dynamic semantic memory, improved grasp planning, better interactivity with users, and robustification of robot hardware.
Reach us at info@study.space
[slides and audio] Demonstrating OK-Robot%3A What Really Matters in Integrating Open-Knowledge Models for Robotics