[slides and audio] PREDILECT%3A Preferences Delineated with Zero-Shot Language-based Reasoning in Reinforcement Learning

The paper "PREDILECT: Preferences Delineated with Zero-Shot Language-based Reasoning in Reinforcement Learning" addresses the challenge of formulating realistic policies for robots by leveraging human preferences and natural language prompts. The authors introduce PREDILECT, a framework that integrates zero-shot language models (LLMs) to enhance the information collected per query, thereby improving sample efficiency. PREDILECT combines preferences with optional text prompts to extract more detailed information from human feedback, reducing the need for extensive human queries. The framework reformulates the reward learning objectives to include flexible highlights—state-action pairs that are highly informative and related to the features processed by the LLM. The effectiveness of PREDILECT is demonstrated through both simulated experiments and a user study in a social robot navigation scenario. The results show that PREDILECT achieves faster convergence and better performance compared to traditional preference-based learning methods, with the LLM accurately extracting relevant information from human descriptions. The paper also discusses the limitations and potential improvements, emphasizing the importance of crafting effective prompts to minimize errors in feature extraction. Overall, PREDILECT offers a promising approach to improve learning from human feedback in robotics.The paper "PREDILECT: Preferences Delineated with Zero-Shot Language-based Reasoning in Reinforcement Learning" addresses the challenge of formulating realistic policies for robots by leveraging human preferences and natural language prompts. The authors introduce PREDILECT, a framework that integrates zero-shot language models (LLMs) to enhance the information collected per query, thereby improving sample efficiency. PREDILECT combines preferences with optional text prompts to extract more detailed information from human feedback, reducing the need for extensive human queries. The framework reformulates the reward learning objectives to include flexible highlights—state-action pairs that are highly informative and related to the features processed by the LLM. The effectiveness of PREDILECT is demonstrated through both simulated experiments and a user study in a social robot navigation scenario. The results show that PREDILECT achieves faster convergence and better performance compared to traditional preference-based learning methods, with the LLM accurately extracting relevant information from human descriptions. The paper also discusses the limitations and potential improvements, emphasizing the importance of crafting effective prompts to minimize errors in feature extraction. Overall, PREDILECT offers a promising approach to improve learning from human feedback in robotics.

PREDILECT: Preferences Delineated with Zero-Shot Language-based Reasoning in Reinforcement Learning

2024 | Simon Holk, Daniel Marta, Iolanda Leite