InterPreT is an interactive framework that enables robots to learn symbolic predicates and operators from language feedback during embodied interaction. The framework leverages Large Language Models (LLMs) like GPT-4 to learn predicates as Python functions, which are iteratively refined based on human feedback. These predicates provide relational abstractions of the environment state, facilitating the learning of symbolic operators that capture action preconditions and effects. By compiling the learned predicates and operators into a Planning Domain Definition Language (PDDL) domain on-the-fly, InterPreT allows effective planning toward arbitrary in-domain goals using a PDDL planner. In both simulated and real-world robot manipulation domains, InterPreT reliably uncovers the key predicates and operators governing the environment dynamics. Although learned from simple training tasks, these predicates and operators exhibit strong generalization to novel tasks with significantly higher complexity. In the most challenging generalization setting, InterPreT attains success rates of 73% in simulation and 40% in the real world, substantially outperforming baseline methods.
InterPreT learns two types of planning-oriented predicates: goal predicates and action precondition predicates. These predicates play an essential role in indicating task progress and determining action feasibility, respectively. A concise and natural communication protocol is designed to incorporate this feedback. The framework includes modules such as the Reasoner, Coder, Corrector, Operator Learner, and Goal Translator, which work together to enable predicate learning. The Reasoner identifies new predicates and extracts task-relevant information from language feedback. The Coder generates Python functions to ground the new predicates, while the Corrector iteratively refines existing predicate functions to align their predictions with the extracted predicate labels. The Operator Learner learns operators from interaction data based on the learned predicates, and the Goal Translator translates language goal specifications into symbolic goals to enable planning.
In experiments, InterPreT is evaluated on a suite of simulated and real-world robot manipulation domains. The results show that InterPreT learns valid predicates and operators that capture essential regularities governing each domain. The learned predicates and operators allow the robot to solve challenging unseen tasks requiring combinatorial generalization, with a 73% success rate in simulation, outperforming all baselines by a large margin. InterPreT can also effectively handle real-world uncertainty and complexities, operating with considerable performance in real-world robot manipulation tasks. The framework demonstrates strong generalization to tasks involving more objects and novel goals, and it shows potential for reusing previously learned predicates to enhance learning efficiency and planning performance in complex domains. InterPreT's results validate the hypothesis that human-like planning proficiency requires interactive learning from rich language input, akin to infant development. However, the framework has limitations, including the assumption that the underlying domain can be well modeled at a symbolic level and the deterministic nature of the learned operators, which may not capture the uncertainty in state transitions.InterPreT is an interactive framework that enables robots to learn symbolic predicates and operators from language feedback during embodied interaction. The framework leverages Large Language Models (LLMs) like GPT-4 to learn predicates as Python functions, which are iteratively refined based on human feedback. These predicates provide relational abstractions of the environment state, facilitating the learning of symbolic operators that capture action preconditions and effects. By compiling the learned predicates and operators into a Planning Domain Definition Language (PDDL) domain on-the-fly, InterPreT allows effective planning toward arbitrary in-domain goals using a PDDL planner. In both simulated and real-world robot manipulation domains, InterPreT reliably uncovers the key predicates and operators governing the environment dynamics. Although learned from simple training tasks, these predicates and operators exhibit strong generalization to novel tasks with significantly higher complexity. In the most challenging generalization setting, InterPreT attains success rates of 73% in simulation and 40% in the real world, substantially outperforming baseline methods.
InterPreT learns two types of planning-oriented predicates: goal predicates and action precondition predicates. These predicates play an essential role in indicating task progress and determining action feasibility, respectively. A concise and natural communication protocol is designed to incorporate this feedback. The framework includes modules such as the Reasoner, Coder, Corrector, Operator Learner, and Goal Translator, which work together to enable predicate learning. The Reasoner identifies new predicates and extracts task-relevant information from language feedback. The Coder generates Python functions to ground the new predicates, while the Corrector iteratively refines existing predicate functions to align their predictions with the extracted predicate labels. The Operator Learner learns operators from interaction data based on the learned predicates, and the Goal Translator translates language goal specifications into symbolic goals to enable planning.
In experiments, InterPreT is evaluated on a suite of simulated and real-world robot manipulation domains. The results show that InterPreT learns valid predicates and operators that capture essential regularities governing each domain. The learned predicates and operators allow the robot to solve challenging unseen tasks requiring combinatorial generalization, with a 73% success rate in simulation, outperforming all baselines by a large margin. InterPreT can also effectively handle real-world uncertainty and complexities, operating with considerable performance in real-world robot manipulation tasks. The framework demonstrates strong generalization to tasks involving more objects and novel goals, and it shows potential for reusing previously learned predicates to enhance learning efficiency and planning performance in complex domains. InterPreT's results validate the hypothesis that human-like planning proficiency requires interactive learning from rich language input, akin to infant development. However, the framework has limitations, including the assumption that the underlying domain can be well modeled at a symbolic level and the deterministic nature of the learned operators, which may not capture the uncertainty in state transitions.