Reinforcement Learning: A Survey

Reinforcement Learning: A Survey

1996 | Leslie Pack Kaelbling, Michael L. Littman, Andrew W. Moore
This paper surveys reinforcement learning from a computer science perspective, accessible to researchers familiar with machine learning. It summarizes the historical basis and current work in the field. Reinforcement learning involves an agent learning behavior through trial-and-error interactions with a dynamic environment. The paper discusses key issues, including exploration vs. exploitation, Markov decision theory, delayed reinforcement, empirical models, generalization, and hidden states. It concludes with a survey of implemented systems and an assessment of current methods' practical utility. Reinforcement learning has roots in cybernetics, statistics, psychology, neuroscience, and computer science. In the last five to ten years, it has gained increasing interest in machine learning and AI. Its promise is appealing—programming agents via reward and punishment without specifying task details. However, computational challenges remain. The paper surveys the historical basis and current work in reinforcement learning from a computer science perspective. It provides an overview of the field and specific approaches. It is impossible to mention all important work, so this is not an exhaustive account. Reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error interactions with a dynamic environment. It has strong family resemblance to psychology work but differs in details and use of the word "reinforcement." It is viewed as a class of problems, not a set of techniques. Two main strategies for solving reinforcement-learning problems are: (1) searching for behaviors that perform well in the environment, and (2) using statistical techniques and dynamic programming to estimate action utility. The paper focuses on the second approach due to its special structure. The paper discusses the trade-off between exploration and exploitation, delayed reinforcement, model-free algorithms like adaptive heuristic critic, TD(λ), and Q-learning, generalization, hidden states, applications, and open problems. It concludes with speculations about future directions.This paper surveys reinforcement learning from a computer science perspective, accessible to researchers familiar with machine learning. It summarizes the historical basis and current work in the field. Reinforcement learning involves an agent learning behavior through trial-and-error interactions with a dynamic environment. The paper discusses key issues, including exploration vs. exploitation, Markov decision theory, delayed reinforcement, empirical models, generalization, and hidden states. It concludes with a survey of implemented systems and an assessment of current methods' practical utility. Reinforcement learning has roots in cybernetics, statistics, psychology, neuroscience, and computer science. In the last five to ten years, it has gained increasing interest in machine learning and AI. Its promise is appealing—programming agents via reward and punishment without specifying task details. However, computational challenges remain. The paper surveys the historical basis and current work in reinforcement learning from a computer science perspective. It provides an overview of the field and specific approaches. It is impossible to mention all important work, so this is not an exhaustive account. Reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error interactions with a dynamic environment. It has strong family resemblance to psychology work but differs in details and use of the word "reinforcement." It is viewed as a class of problems, not a set of techniques. Two main strategies for solving reinforcement-learning problems are: (1) searching for behaviors that perform well in the environment, and (2) using statistical techniques and dynamic programming to estimate action utility. The paper focuses on the second approach due to its special structure. The paper discusses the trade-off between exploration and exploitation, delayed reinforcement, model-free algorithms like adaptive heuristic critic, TD(λ), and Q-learning, generalization, hidden states, applications, and open problems. It concludes with speculations about future directions.
Reach us at info@study.space
[slides and audio] Reinforcement Learning%3A A Survey