Understanding Natural Actor-Critic

This paper introduces the Natural Actor-Critic (NAC) algorithm, a novel model-free reinforcement learning architecture. The actor updates are based on stochastic policy gradients using Amari's natural gradient approach, while the critic estimates both the natural policy gradient and additional parameters of a value function through linear regression. The natural policy gradient is shown to be more efficient and converges faster compared to vanilla policy gradients, avoiding issues like oscillations and divergence. The critic uses a special basis function parameterization to approximate the advantage function, which is mean-zero with respect to the action distribution. The NAC algorithm is demonstrated to be effective through empirical evaluations, including control tasks on an anthropomorphic robot arm. The paper also shows that several well-known reinforcement learning methods, such as the original Actor-Critic and Bradtke's Linear Quadratic Q-Learning, can be seen as special cases of the NAC architecture. Additionally, the paper discusses the covariance of natural policy gradients and their relation to previous algorithms, providing theoretical guarantees and practical insights.This paper introduces the Natural Actor-Critic (NAC) algorithm, a novel model-free reinforcement learning architecture. The actor updates are based on stochastic policy gradients using Amari's natural gradient approach, while the critic estimates both the natural policy gradient and additional parameters of a value function through linear regression. The natural policy gradient is shown to be more efficient and converges faster compared to vanilla policy gradients, avoiding issues like oscillations and divergence. The critic uses a special basis function parameterization to approximate the advantage function, which is mean-zero with respect to the action distribution. The NAC algorithm is demonstrated to be effective through empirical evaluations, including control tasks on an anthropomorphic robot arm. The paper also shows that several well-known reinforcement learning methods, such as the original Actor-Critic and Bradtke's Linear Quadratic Q-Learning, can be seen as special cases of the NAC architecture. Additionally, the paper discusses the covariance of natural policy gradients and their relation to previous algorithms, providing theoretical guarantees and practical insights.

Natural Actor-Critic

2005 | Jan Peters, Sethu Vijayakumar, and Stefan Schaal