31 May 2024 | Jacky Liang*, Fei Xia*, Wenhao Yu*, Andy Zeng*, Montserrat Gonzalez Arenas, Maria Attarian, Maria Bauza, Matthew Bennice, Alex Bewley, Adil Dostmohamed, Chuyuan Kelly Fu, Nimrod Gileadi, Marissa Giustina, Keerthana Gopalakrishnan, Leonard Hasenclever, Jan Humphlik, Jasmine Hsu, Nikhil Joshi, Ben Jyenis, Chase Kew, Sean Kirmani, Tsang-Wei Edward Lee, Kuang-Huei Lee, Assaf Hurwitz Michaely, Joss Moore, Ken Oslund, Dushyant Rao, Allen Ren, Baruch Tabanpour, Quan Vuong, Ayzaan Wahid, Ted Xiao, Ying Xu, Vincent Zhuang, Peng Xu†, Erik Frey†, Ken Caluwaerts†, Tingnan Zhang†, Brian Ichter†, Jonathan Tompson†, Leila Takayama†, Vincent Vanhoucke†, Izhak Shafran†, Maja Mataric†, Dorsa Sadigh†, Nicolas Heess†, Kanishka Rao†, Nik Stewart†, Jie Tan†, Carolina Parada†
This paper explores the improvement of teachability in large language models (LLMs) for robot code-writing tasks, enabling non-experts to teach robots new behaviors through natural language. The authors introduce Language Model Predictive Control (LMPC), a framework that fine-tunes LLMs to enhance their ability to adapt to human feedback and improve teachability. LMPC combines in-context learning for fast adaptation with model predictive control (MPC) for slow adaptation, allowing the LLM to remember and learn from past interactions. The key observation is that human-robot interactions can be modeled as a partially observable Markov decision process (POMDP), where the LLM's task is to predict future interactions based on past data. This approach improves the LLM's performance on unseen tasks, reducing the average number of corrections needed and increasing teaching success rates. Experiments with 5 robot embodiments and 78 tasks show that LMPC significantly enhances teachability, with improvements generalizing to new robot APIs and embodiments. The method also identifies top-performing users and leverages their interactions to drive overall performance improvements. The paper discusses limitations and suggests future directions, including the use of multimodal models and active task exploration.This paper explores the improvement of teachability in large language models (LLMs) for robot code-writing tasks, enabling non-experts to teach robots new behaviors through natural language. The authors introduce Language Model Predictive Control (LMPC), a framework that fine-tunes LLMs to enhance their ability to adapt to human feedback and improve teachability. LMPC combines in-context learning for fast adaptation with model predictive control (MPC) for slow adaptation, allowing the LLM to remember and learn from past interactions. The key observation is that human-robot interactions can be modeled as a partially observable Markov decision process (POMDP), where the LLM's task is to predict future interactions based on past data. This approach improves the LLM's performance on unseen tasks, reducing the average number of corrections needed and increasing teaching success rates. Experiments with 5 robot embodiments and 78 tasks show that LMPC significantly enhances teachability, with improvements generalizing to new robot APIs and embodiments. The method also identifies top-performing users and leverages their interactions to drive overall performance improvements. The paper discusses limitations and suggests future directions, including the use of multimodal models and active task exploration.