Submitted 9/95; published 5/96 | Leslie Pack Kaelbling, Michael L. Littman, Andrew W. Moore
This paper provides a comprehensive survey of reinforcement learning from a computer science perspective, aimed at researchers familiar with machine learning. It covers the historical development of the field and a broad range of current research. Reinforcement learning involves an agent learning through trial-and-error interactions with a dynamic environment, focusing on the trade-offs between exploration and exploitation, the use of Markov decision theory, learning from delayed rewards, empirical modeling, generalization, and handling hidden states. The paper discusses various algorithms and techniques, including value iteration, policy iteration, adaptive heuristic critics, TD(λ), and Q-learning, and explores their practical applications and limitations. It concludes with an assessment of the current state of reinforcement learning and future directions.This paper provides a comprehensive survey of reinforcement learning from a computer science perspective, aimed at researchers familiar with machine learning. It covers the historical development of the field and a broad range of current research. Reinforcement learning involves an agent learning through trial-and-error interactions with a dynamic environment, focusing on the trade-offs between exploration and exploitation, the use of Markov decision theory, learning from delayed rewards, empirical modeling, generalization, and handling hidden states. The paper discusses various algorithms and techniques, including value iteration, policy iteration, adaptive heuristic critics, TD(λ), and Q-learning, and explores their practical applications and limitations. It concludes with an assessment of the current state of reinforcement learning and future directions.