29 Jan 2019 | Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, Sergey Levine
The paper introduces Soft Actor-Critic (SAC), an off-policy maximum entropy deep reinforcement learning algorithm designed to address the challenges of high sample complexity and hyperparameter sensitivity in model-free deep reinforcement learning. SAC combines the benefits of entropy maximization and stability, aiming to maximize both expected return and entropy. The authors extend SAC to include modifications that improve training efficiency and hyperparameter robustness, such as a constrained formulation for automatic temperature tuning. Empirical evaluations on benchmark tasks and real-world robotic tasks, including quadrupedal locomotion and dexterous hand manipulation, demonstrate SAC's superior performance in terms of sample efficiency and asymptotic performance compared to prior methods. The method is shown to be stable across different random seeds and to generalize well to unseen terrains and obstacles in real-world experiments, making it a promising candidate for real-world robotics applications.The paper introduces Soft Actor-Critic (SAC), an off-policy maximum entropy deep reinforcement learning algorithm designed to address the challenges of high sample complexity and hyperparameter sensitivity in model-free deep reinforcement learning. SAC combines the benefits of entropy maximization and stability, aiming to maximize both expected return and entropy. The authors extend SAC to include modifications that improve training efficiency and hyperparameter robustness, such as a constrained formulation for automatic temperature tuning. Empirical evaluations on benchmark tasks and real-world robotic tasks, including quadrupedal locomotion and dexterous hand manipulation, demonstrate SAC's superior performance in terms of sample efficiency and asymptotic performance compared to prior methods. The method is shown to be stable across different random seeds and to generalize well to unseen terrains and obstacles in real-world experiments, making it a promising candidate for real-world robotics applications.