2024 | Karthik Mahadevan, Jonathan Chien, Noah Brown, Zhuo Xu, Carolina Parada, Andy Zeng, Leila Takayama, Dorsa Sadigh
This paper presents a new approach called Generative Expressive Motion (GenEM) for autonomously generating expressive robot behaviors using large language models (LLMs). The goal is to enable robots to generate expressive behaviors that are flexible, adaptable, and composable. GenEM leverages the rich social context available from LLMs and their ability to generate motion based on instructions or user preferences. The approach uses few-shot chain-of-thought prompting to translate human language instructions into parametrized control code using the robot's available and learned skills. Through user studies and simulation experiments, the authors demonstrate that GenEM produces behaviors that users found to be competent and easy to understand.
GenEM uses a modular approach, with several LLMs playing distinct roles. The approach takes user language instructions as input and outputs a robot policy in the form of parameterized code. Human iterative feedback can be used to update the policy. The policy parameters are updated one step at a time given the feedback. The policy can be instantiated from some initial state to produce trajectories or instantiations of expressive robot behavior.
The approach includes several steps: (1) expressive instruction following, where the input is a language instruction that can either be a description of a social context or an instruction that describes an expressive behavior; (2) translating human expressive motion to robot expressive motion, where an LLM is used to translate human expressive motion to robot expressive motion; and (3) translating robot expressive motion to code, where an LLM is used to translate the step-by-step procedure of how to produce expressive robot motion into executable code.
The authors conducted two user studies to assess whether GenEM can be used to generate expressive behaviors that are perceivable by people. In both studies, all comparisons were made against behaviors designed by a professional animator and implemented by a software developer, which they term the oracle animator. The results showed that GenEM behaviors were well received and were not significantly worse than the oracle animator behaviors in most cases.
The authors also conducted experiments to study different aspects of GenEM, including ablations to understand the impact of their prompting structure and the modular calls to different LLMs versus an end-to-end approach. The results showed that GenEM produced higher success rates compared to the ablated variation where no successful runs were generated for two behaviors. The authors also showed that GenEM can produce modular and composable behaviors, i.e., behaviors that build on top of each other.
The authors concluded that their approach presents a flexible framework for generating adaptable and composable expressive motion through the power of large language models. They hope that this inspires future efforts towards expressive behavior generation for robots to more effectively interact with people.This paper presents a new approach called Generative Expressive Motion (GenEM) for autonomously generating expressive robot behaviors using large language models (LLMs). The goal is to enable robots to generate expressive behaviors that are flexible, adaptable, and composable. GenEM leverages the rich social context available from LLMs and their ability to generate motion based on instructions or user preferences. The approach uses few-shot chain-of-thought prompting to translate human language instructions into parametrized control code using the robot's available and learned skills. Through user studies and simulation experiments, the authors demonstrate that GenEM produces behaviors that users found to be competent and easy to understand.
GenEM uses a modular approach, with several LLMs playing distinct roles. The approach takes user language instructions as input and outputs a robot policy in the form of parameterized code. Human iterative feedback can be used to update the policy. The policy parameters are updated one step at a time given the feedback. The policy can be instantiated from some initial state to produce trajectories or instantiations of expressive robot behavior.
The approach includes several steps: (1) expressive instruction following, where the input is a language instruction that can either be a description of a social context or an instruction that describes an expressive behavior; (2) translating human expressive motion to robot expressive motion, where an LLM is used to translate human expressive motion to robot expressive motion; and (3) translating robot expressive motion to code, where an LLM is used to translate the step-by-step procedure of how to produce expressive robot motion into executable code.
The authors conducted two user studies to assess whether GenEM can be used to generate expressive behaviors that are perceivable by people. In both studies, all comparisons were made against behaviors designed by a professional animator and implemented by a software developer, which they term the oracle animator. The results showed that GenEM behaviors were well received and were not significantly worse than the oracle animator behaviors in most cases.
The authors also conducted experiments to study different aspects of GenEM, including ablations to understand the impact of their prompting structure and the modular calls to different LLMs versus an end-to-end approach. The results showed that GenEM produced higher success rates compared to the ablated variation where no successful runs were generated for two behaviors. The authors also showed that GenEM can produce modular and composable behaviors, i.e., behaviors that build on top of each other.
The authors concluded that their approach presents a flexible framework for generating adaptable and composable expressive motion through the power of large language models. They hope that this inspires future efforts towards expressive behavior generation for robots to more effectively interact with people.