[slides and audio] Deep Reinforcement Learning from Human Preferences

This paper presents a method for deep reinforcement learning (RL) that learns complex goals from human preferences without requiring a predefined reward function. The approach involves training an agent to maximize a reward function inferred from human comparisons of trajectory segments. The agent interacts with the environment, and humans provide feedback by comparing short video clips of the agent's behavior. This feedback is used to train a reward function, which is then used to optimize the agent's policy. The method is demonstrated on two domains: Atari games and simulated robotics tasks. The results show that the method can learn complex behaviors with minimal human feedback, significantly reducing the cost of human oversight. The approach is flexible and can be applied to a wide range of tasks, including those where traditional reward functions are difficult to define. The method is also shown to be effective in learning novel behaviors that would be challenging to encode in a reward function. The paper also discusses related work and provides ablation studies to evaluate the effectiveness of different components of the method. Overall, the approach represents a significant advancement in deep RL, enabling the training of complex agents from human preferences with minimal human input.This paper presents a method for deep reinforcement learning (RL) that learns complex goals from human preferences without requiring a predefined reward function. The approach involves training an agent to maximize a reward function inferred from human comparisons of trajectory segments. The agent interacts with the environment, and humans provide feedback by comparing short video clips of the agent's behavior. This feedback is used to train a reward function, which is then used to optimize the agent's policy. The method is demonstrated on two domains: Atari games and simulated robotics tasks. The results show that the method can learn complex behaviors with minimal human feedback, significantly reducing the cost of human oversight. The approach is flexible and can be applied to a wide range of tasks, including those where traditional reward functions are difficult to define. The method is also shown to be effective in learning novel behaviors that would be challenging to encode in a reward function. The paper also discusses related work and provides ablation studies to evaluate the effectiveness of different components of the method. Overall, the approach represents a significant advancement in deep RL, enabling the training of complex agents from human preferences with minimal human input.

Deep Reinforcement Learning from Human Preferences

17 Feb 2023 | Paul F Christiano, Jan Leike, Tom B Brown, Miljan Martic, Shane Legg, Dario Amodei