19 Mar 2024 | Lucy Xiaoyang Shi, Zheyuan Hu, Tony Z. Zhao, Archit Sharma, Karl Pertsch, Jianlan Luo, Sergey Levine, Chelsea Finn
This paper introduces YAY Robot, a system that enables robots to improve their performance on long-horizon tasks through natural language feedback. The system uses a hierarchical policy structure, where a high-level policy generates language instructions that guide a low-level policy to execute specific skills. The high-level policy is trained to predict language instructions based on observations, and can be fine-tuned using human corrections to improve its ability to correct errors in both low-level execution and high-level decision-making. The system allows robots to adapt to real-time language feedback and incorporate this feedback into an iterative training scheme that enhances the high-level policy's performance. The paper evaluates the system on three bi-manual manipulation tasks: packing items into a ziploc bag, preparing trail mix, and cleaning a plate. Results show that incorporating language corrections leads to significant improvements in task success rates, with performance increasing from 15% to 50% in real-time and from 15% to 45% after fine-tuning. The system is designed to operate autonomously while still being able to improve from verbal corrections when provided. YAY Robot is compared to other methods, including flat policies and vision-language models, and is shown to outperform them in terms of success rates on long-horizon tasks. The paper also discusses the importance of high-quality data and the limitations of the approach, including the need for a performant low-level policy to effectively handle language corrections. Overall, YAY Robot demonstrates the potential of using natural language feedback to improve robot performance on complex tasks.This paper introduces YAY Robot, a system that enables robots to improve their performance on long-horizon tasks through natural language feedback. The system uses a hierarchical policy structure, where a high-level policy generates language instructions that guide a low-level policy to execute specific skills. The high-level policy is trained to predict language instructions based on observations, and can be fine-tuned using human corrections to improve its ability to correct errors in both low-level execution and high-level decision-making. The system allows robots to adapt to real-time language feedback and incorporate this feedback into an iterative training scheme that enhances the high-level policy's performance. The paper evaluates the system on three bi-manual manipulation tasks: packing items into a ziploc bag, preparing trail mix, and cleaning a plate. Results show that incorporating language corrections leads to significant improvements in task success rates, with performance increasing from 15% to 50% in real-time and from 15% to 45% after fine-tuning. The system is designed to operate autonomously while still being able to improve from verbal corrections when provided. YAY Robot is compared to other methods, including flat policies and vision-language models, and is shown to outperform them in terms of success rates on long-horizon tasks. The paper also discusses the importance of high-quality data and the limitations of the approach, including the need for a performant low-level policy to effectively handle language corrections. Overall, YAY Robot demonstrates the potential of using natural language feedback to improve robot performance on complex tasks.