Learning Reward for Robot Skills Using Large Language Models via Self-Alignment

Learning Reward for Robot Skills Using Large Language Models via Self-Alignment

2024 | Yuwei Zeng, Yao Mu, Lin Shao
This paper presents a method to learn reward functions for robots using Large Language Models (LLMs) in the absence of human intervention. The proposed approach consists of two main components: first, the LLM is used to propose features and parameterization of the reward function; second, the parameters are updated through an iterative self-alignment process that minimizes the ranking inconsistency between the LLM and the learned reward functions based on execution feedback. The method is validated on 9 tasks across 2 simulation environments, demonstrating consistent improvements in training efficacy and efficiency while consuming significantly fewer GPT tokens compared to alternative mutation-based methods. The key contributions include a framework for learning reward functions using LLMs through self-alignment, active parameter adjustment with LLM heuristics, and a validation on multiple tasks with improved performance and token efficiency.This paper presents a method to learn reward functions for robots using Large Language Models (LLMs) in the absence of human intervention. The proposed approach consists of two main components: first, the LLM is used to propose features and parameterization of the reward function; second, the parameters are updated through an iterative self-alignment process that minimizes the ranking inconsistency between the LLM and the learned reward functions based on execution feedback. The method is validated on 9 tasks across 2 simulation environments, demonstrating consistent improvements in training efficacy and efficiency while consuming significantly fewer GPT tokens compared to alternative mutation-based methods. The key contributions include a framework for learning reward functions using LLMs through self-alignment, active parameter adjustment with LLM heuristics, and a validation on multiple tasks with improved performance and token efficiency.
Reach us at info@study.space
[slides] Learning Reward for Robot Skills Using Large Language Models via Self-Alignment | StudySpace