[slides] Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

This article presents a general class of associative reinforcement learning algorithms for connectionist networks with stochastic units, known as REINFORCE algorithms. These algorithms adjust weights in the direction of expected reinforcement without explicitly computing gradient estimates or storing information for gradient estimation. The paper discusses specific examples of these algorithms, some of which are novel and others that are related to existing algorithms. It also explores how REINFORCE algorithms can be integrated with backpropagation. The main focus is on algorithms that follow or estimate a relevant gradient, which can be useful for generating simple yet effective algorithms. The article includes analytical results and simulation studies to understand the behavior of these algorithms, including their convergence properties and performance in various tasks. Additionally, it addresses issues such as the choice of reinforcement baseline and the adaptability of parameters in Gaussian units.This article presents a general class of associative reinforcement learning algorithms for connectionist networks with stochastic units, known as REINFORCE algorithms. These algorithms adjust weights in the direction of expected reinforcement without explicitly computing gradient estimates or storing information for gradient estimation. The paper discusses specific examples of these algorithms, some of which are novel and others that are related to existing algorithms. It also explores how REINFORCE algorithms can be integrated with backpropagation. The main focus is on algorithms that follow or estimate a relevant gradient, which can be useful for generating simple yet effective algorithms. The article includes analytical results and simulation studies to understand the behavior of these algorithms, including their convergence properties and performance in various tasks. Additionally, it addresses issues such as the choice of reinforcement baseline and the adaptability of parameters in Gaussian units.

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

1992 | RONALD J. WILLIAMS