This article presents a general class of associative reinforcement learning algorithms for connectionist networks with stochastic units, called REINFORCE algorithms. These algorithms adjust weights in the direction of the gradient of expected reinforcement, without explicitly computing gradient estimates. Examples of such algorithms are given, some of which are related to existing algorithms, while others are novel. The article also discusses how these algorithms can be integrated with backpropagation. It concludes with a discussion of additional issues, including the limiting behavior of these algorithms and their potential for development into more powerful reinforcement learning algorithms.
The article begins by introducing the framework of reinforcement learning, which encompasses a wide range of problems, from function optimization to learning control. It emphasizes the need to integrate various techniques for effective reinforcement learning in realistic environments. The focus is on algorithms for associative tasks with immediate reinforcement, where the reinforcement is determined by the most recent input-output pair. While delayed reinforcement tasks are important, they are often addressed by combining immediate-reinforcement learners with adaptive predictors.
The article discusses the use of stochastic semilinear units, which are common in connectionist networks. These units have outputs drawn from probability distributions, and their behavior is determined by parameters and inputs. The article also introduces Bernoulli semilinear units, which are a special case of stochastic semilinear units with binary outputs.
The article then presents the expected reinforcement performance criterion, which is used to optimize the performance of reinforcement learning networks. It discusses the REINFORCE algorithms, which adjust weights based on the reinforcement signal and the characteristic eligibility of the weights. The article provides mathematical results showing that the average update vector in weight space lies in a direction of increasing performance.
The article also discusses episodic REINFORCE algorithms, which are used for tasks with temporal credit-assignment components. These algorithms adjust weights based on the reinforcement signal and the characteristic eligibility of the weights over multiple time steps.
The article further explores the compatibility of REINFORCE algorithms with backpropagation, showing how backpropagation can be used to compute partial derivatives in the context of these algorithms. It discusses the use of backpropagation in networks with deterministic hidden units and random number generators.
The article concludes with a discussion of algorithm performance and other issues, including convergence properties and the role of reinforcement baselines. It highlights the importance of understanding the behavior of these algorithms and the potential for developing more powerful reinforcement learning algorithms.This article presents a general class of associative reinforcement learning algorithms for connectionist networks with stochastic units, called REINFORCE algorithms. These algorithms adjust weights in the direction of the gradient of expected reinforcement, without explicitly computing gradient estimates. Examples of such algorithms are given, some of which are related to existing algorithms, while others are novel. The article also discusses how these algorithms can be integrated with backpropagation. It concludes with a discussion of additional issues, including the limiting behavior of these algorithms and their potential for development into more powerful reinforcement learning algorithms.
The article begins by introducing the framework of reinforcement learning, which encompasses a wide range of problems, from function optimization to learning control. It emphasizes the need to integrate various techniques for effective reinforcement learning in realistic environments. The focus is on algorithms for associative tasks with immediate reinforcement, where the reinforcement is determined by the most recent input-output pair. While delayed reinforcement tasks are important, they are often addressed by combining immediate-reinforcement learners with adaptive predictors.
The article discusses the use of stochastic semilinear units, which are common in connectionist networks. These units have outputs drawn from probability distributions, and their behavior is determined by parameters and inputs. The article also introduces Bernoulli semilinear units, which are a special case of stochastic semilinear units with binary outputs.
The article then presents the expected reinforcement performance criterion, which is used to optimize the performance of reinforcement learning networks. It discusses the REINFORCE algorithms, which adjust weights based on the reinforcement signal and the characteristic eligibility of the weights. The article provides mathematical results showing that the average update vector in weight space lies in a direction of increasing performance.
The article also discusses episodic REINFORCE algorithms, which are used for tasks with temporal credit-assignment components. These algorithms adjust weights based on the reinforcement signal and the characteristic eligibility of the weights over multiple time steps.
The article further explores the compatibility of REINFORCE algorithms with backpropagation, showing how backpropagation can be used to compute partial derivatives in the context of these algorithms. It discusses the use of backpropagation in networks with deterministic hidden units and random number generators.
The article concludes with a discussion of algorithm performance and other issues, including convergence properties and the role of reinforcement baselines. It highlights the importance of understanding the behavior of these algorithms and the potential for developing more powerful reinforcement learning algorithms.