17 Feb 2023 | Paul F Christiano, Jan Leike, Tom B Brown, Miljan Martic, Shane Legg, Dario Amodei
This paper explores a novel approach to reinforcement learning (RL) that leverages human preferences to define complex goals. The authors propose a method to learn a reward function from human feedback, which can then be used to train an agent to optimize that reward function. This approach is particularly useful for tasks where a well-specified reward function is difficult or impossible to design, such as controlling a robot with many degrees of freedom or performing complex actions like backflips. The paper demonstrates that this method can effectively solve complex RL tasks, including Atari games and simulated robot locomotion, with less than 1% of the human oversight required for traditional RL systems. The authors show that their algorithm can learn novel behaviors with about an hour of human feedback, which is significantly less than the time typically required for human oversight in other methods. The paper also includes experimental results and ablation studies to validate the effectiveness of the proposed approach.This paper explores a novel approach to reinforcement learning (RL) that leverages human preferences to define complex goals. The authors propose a method to learn a reward function from human feedback, which can then be used to train an agent to optimize that reward function. This approach is particularly useful for tasks where a well-specified reward function is difficult or impossible to design, such as controlling a robot with many degrees of freedom or performing complex actions like backflips. The paper demonstrates that this method can effectively solve complex RL tasks, including Atari games and simulated robot locomotion, with less than 1% of the human oversight required for traditional RL systems. The authors show that their algorithm can learn novel behaviors with about an hour of human feedback, which is significantly less than the time typically required for human oversight in other methods. The paper also includes experimental results and ablation studies to validate the effectiveness of the proposed approach.