23 Feb 2018 | Marcin Andrychowicz*, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel†, Wojciech Zaremba†
The paper introduces a novel technique called *Hindsight Experience Replay (HER)*, which enables sample-efficient learning from sparse and binary rewards. HER can be combined with any off-policy reinforcement learning (RL) algorithm and is particularly useful for tasks with multiple goals. The key idea is to replay each episode with different goals, allowing the algorithm to learn from both successful and unsuccessful outcomes. The authors demonstrate the effectiveness of HER on a robotic arm manipulation task, showing that policies trained with HER can successfully complete tasks using only binary rewards indicating task completion. The approach is based on training universal policies that take both the current state and a goal state as input. The paper also includes experiments on a physical robot, confirming the practical applicability of HER.The paper introduces a novel technique called *Hindsight Experience Replay (HER)*, which enables sample-efficient learning from sparse and binary rewards. HER can be combined with any off-policy reinforcement learning (RL) algorithm and is particularly useful for tasks with multiple goals. The key idea is to replay each episode with different goals, allowing the algorithm to learn from both successful and unsuccessful outcomes. The authors demonstrate the effectiveness of HER on a robotic arm manipulation task, showing that policies trained with HER can successfully complete tasks using only binary rewards indicating task completion. The approach is based on training universal policies that take both the current state and a goal state as input. The paper also includes experiments on a physical robot, confirming the practical applicability of HER.