Reinforcement Learning with Deep Energy-Based Policies

Reinforcement Learning with Deep Energy-Based Policies

2017 | Tuomas Haarnoja * 1 Haoran Tang * 2 Pieter Abbeel 1 3 4 Sergey Levine 1
The paper introduces a method for learning expressive energy-based policies for continuous states and actions, which was previously feasible only in tabular domains. The method, called soft Q-learning, expresses the optimal policy via a Boltzmann distribution and uses amortized Stein variational gradient descent to learn a stochastic sampling network that approximates samples from this distribution. The benefits of the proposed algorithm include improved exploration and compositionality, allowing for the transfer of skills between tasks. The paper also draws connections to actor-critic methods, showing that they can be viewed as approximate inference on the corresponding energy-based model. Experimental results with simulated swimming and walking robots demonstrate the effectiveness of the method in multi-modal reward landscapes and as a good initialization for learning new skills.The paper introduces a method for learning expressive energy-based policies for continuous states and actions, which was previously feasible only in tabular domains. The method, called soft Q-learning, expresses the optimal policy via a Boltzmann distribution and uses amortized Stein variational gradient descent to learn a stochastic sampling network that approximates samples from this distribution. The benefits of the proposed algorithm include improved exploration and compositionality, allowing for the transfer of skills between tasks. The paper also draws connections to actor-critic methods, showing that they can be viewed as approximate inference on the corresponding energy-based model. Experimental results with simulated swimming and walking robots demonstrate the effectiveness of the method in multi-modal reward landscapes and as a good initialization for learning new skills.
Reach us at info@study.space
[slides and audio] Reinforcement Learning with Deep Energy-Based Policies