31 May 2024 | Jacky Liang, Fei Xia, Wenhao Yu, Andy Zeng, Montserrat Gonzalez Arenas, Maria Attarian, Maria Bauza, Matthew Bennice, Alex Bewley, Adil Dostmohamed, Chuyuan Kelly Fu, Nimrod Gileadi, Marissa Giustina, Keerthana Gopalakrishnan, Leonard Hasenclever, Jan Humplik, Jasmine Hsu, Nikhil Joshi, Ben Jyenis, Chase Kew, Sean Kirmani, Tsang-Wei Edward Lee, Kuang-Huei Lee, Assaf Hurwitz Michael, Joss Moore, Ken Oslund, Dushyant Rao, Allen Ren, Baruch Tabanpour, Quan Vuong, Ayzaan Wahid, Ted Xiao, Ying Xu, Vincent Zhuang, Peng Xu, Erik Frey, Ken Caluwaerts, Tingnan Zhang, Brian Ichter, Jonathan Tompson, Leila Takayama, Vincent Vanhoucke, Izhak Shafran, Maja Mataric, Dorsa Sadigh, Nicolas Heess, Kanishka Rao, Nik Stewart, Jie Tan, Carolina Parada
This paper introduces Language Model Predictive Control (LMPC), a framework that improves the teachability of large language models (LLMs) for robot code-writing tasks. The goal is to enable non-experts to teach robots new tasks using natural language feedback, with the LLMs adapting more efficiently to human inputs. LMPC combines in-context learning (fast adaptation) with model fine-tuning (slow adaptation) to enhance the LLM's ability to learn from human feedback and improve its performance on unseen tasks.
The key idea is to treat human-robot interactions as a partially observable Markov decision process (POMDP), where human language inputs are observations and robot code outputs are actions. By training the LLM to predict future interactions, LMPC can be combined with model predictive control (MPC) to find shorter paths to success. This approach allows the LLM to learn from previous interactions and adapt more efficiently to new tasks.
Experiments show that LMPC significantly improves the teachability of PaLM 2 on 78 tasks across 5 robot embodiments, reducing the average number of human corrections from 2.4 to 1.9 and increasing success rates on unseen tasks by 26.9%. LMPC also produces strong meta-learners, improving the success rate of in-context learning new tasks on unseen robot embodiments and APIs by 31.5%. Additionally, top-user conditioning further enhances performance, as it allows the LLM to learn from the most effective users and apply their knowledge to all users.
The framework is evaluated on both simulated and real-world robot embodiments, demonstrating its effectiveness in improving robot teaching success rates. LMPC outperforms retrieval-based baselines and user studies suggest that the improvements are due to changes in model capability rather than user teaching proficiency. The approach is limited by the need for sufficient computational resources and the assumption that the base LLM can generate positive chat sessions for learning. Future work could explore using multimodal models and more efficient training techniques to further improve performance.This paper introduces Language Model Predictive Control (LMPC), a framework that improves the teachability of large language models (LLMs) for robot code-writing tasks. The goal is to enable non-experts to teach robots new tasks using natural language feedback, with the LLMs adapting more efficiently to human inputs. LMPC combines in-context learning (fast adaptation) with model fine-tuning (slow adaptation) to enhance the LLM's ability to learn from human feedback and improve its performance on unseen tasks.
The key idea is to treat human-robot interactions as a partially observable Markov decision process (POMDP), where human language inputs are observations and robot code outputs are actions. By training the LLM to predict future interactions, LMPC can be combined with model predictive control (MPC) to find shorter paths to success. This approach allows the LLM to learn from previous interactions and adapt more efficiently to new tasks.
Experiments show that LMPC significantly improves the teachability of PaLM 2 on 78 tasks across 5 robot embodiments, reducing the average number of human corrections from 2.4 to 1.9 and increasing success rates on unseen tasks by 26.9%. LMPC also produces strong meta-learners, improving the success rate of in-context learning new tasks on unseen robot embodiments and APIs by 31.5%. Additionally, top-user conditioning further enhances performance, as it allows the LLM to learn from the most effective users and apply their knowledge to all users.
The framework is evaluated on both simulated and real-world robot embodiments, demonstrating its effectiveness in improving robot teaching success rates. LMPC outperforms retrieval-based baselines and user studies suggest that the improvements are due to changes in model capability rather than user teaching proficiency. The approach is limited by the need for sufficient computational resources and the assumption that the base LLM can generate positive chat sessions for learning. Future work could explore using multimodal models and more efficient training techniques to further improve performance.