[slides and audio] Learning to Learn Faster from Human Feedback with Language Model Predictive Control

This paper explores the improvement of teachability in large language models (LLMs) for robot code-writing tasks, enabling non-experts to teach robots new behaviors through natural language. The authors introduce Language Model Predictive Control (LMPC), a framework that fine-tunes LLMs to enhance their ability to adapt to human feedback and improve teachability. LMPC combines in-context learning for fast adaptation with model predictive control (MPC) for slow adaptation, allowing the LLM to remember and learn from past interactions. The key observation is that human-robot interactions can be modeled as a partially observable Markov decision process (POMDP), where the LLM's task is to predict future interactions based on past data. This approach improves the LLM's performance on unseen tasks, reducing the average number of corrections needed and increasing teaching success rates. Experiments with 5 robot embodiments and 78 tasks show that LMPC significantly enhances teachability, with improvements generalizing to new robot APIs and embodiments. The method also identifies top-performing users and leverages their interactions to drive overall performance improvements. The paper discusses limitations and suggests future directions, including the use of multimodal models and active task exploration.This paper explores the improvement of teachability in large language models (LLMs) for robot code-writing tasks, enabling non-experts to teach robots new behaviors through natural language. The authors introduce Language Model Predictive Control (LMPC), a framework that fine-tunes LLMs to enhance their ability to adapt to human feedback and improve teachability. LMPC combines in-context learning for fast adaptation with model predictive control (MPC) for slow adaptation, allowing the LLM to remember and learn from past interactions. The key observation is that human-robot interactions can be modeled as a partially observable Markov decision process (POMDP), where the LLM's task is to predict future interactions based on past data. This approach improves the LLM's performance on unseen tasks, reducing the average number of corrections needed and increasing teaching success rates. Experiments with 5 robot embodiments and 78 tasks show that LMPC significantly enhances teachability, with improvements generalizing to new robot APIs and embodiments. The method also identifies top-performing users and leverages their interactions to drive overall performance improvements. The paper discusses limitations and suggests future directions, including the use of multimodal models and active task exploration.