This book provides a comprehensive overview of algorithms for reinforcement learning (RL), focusing on those that build on the theory of dynamic programming. The goal is to develop efficient learning algorithms and to understand their merits and limitations. RL is a learning paradigm concerned with learning to control a system to maximize a numerical performance measure that expresses a long-term objective. Unlike supervised learning, RL provides only partial feedback and the predictions may have long-term effects. Time plays a special role in RL, and the book discusses various algorithms, including temporal difference learning, Monte-Carlo methods, function approximation, and stochastic gradient methods. The book also covers the theoretical properties and limitations of these algorithms, as well as their applications in areas such as artificial intelligence, operations research, and control engineering.
The book is structured into three parts. The first part introduces the necessary background, including Markov Decision Processes (MDPs), value functions, and dynamic programming algorithms. The second part focuses on value prediction problems, discussing algorithms such as TD(λ), gradient temporal difference learning, and least-squares methods. The third part is devoted to control learning, covering online learning in bandits, active learning, and direct methods such as Q-learning and actor-critic methods. The book also includes a section on further exploration, listing topics for further study.
The book is intended for advanced undergraduate and graduate students, as well as researchers and practitioners in the field of RL. It assumes familiarity with linear algebra, calculus, and probability theory. The book aims to provide a good understanding of the key concepts and algorithms in RL, as well as the theoretical foundations and practical applications of these methods. The author also acknowledges the contributions of various individuals who helped in the development and review of the book.This book provides a comprehensive overview of algorithms for reinforcement learning (RL), focusing on those that build on the theory of dynamic programming. The goal is to develop efficient learning algorithms and to understand their merits and limitations. RL is a learning paradigm concerned with learning to control a system to maximize a numerical performance measure that expresses a long-term objective. Unlike supervised learning, RL provides only partial feedback and the predictions may have long-term effects. Time plays a special role in RL, and the book discusses various algorithms, including temporal difference learning, Monte-Carlo methods, function approximation, and stochastic gradient methods. The book also covers the theoretical properties and limitations of these algorithms, as well as their applications in areas such as artificial intelligence, operations research, and control engineering.
The book is structured into three parts. The first part introduces the necessary background, including Markov Decision Processes (MDPs), value functions, and dynamic programming algorithms. The second part focuses on value prediction problems, discussing algorithms such as TD(λ), gradient temporal difference learning, and least-squares methods. The third part is devoted to control learning, covering online learning in bandits, active learning, and direct methods such as Q-learning and actor-critic methods. The book also includes a section on further exploration, listing topics for further study.
The book is intended for advanced undergraduate and graduate students, as well as researchers and practitioners in the field of RL. It assumes familiarity with linear algebra, calculus, and probability theory. The book aims to provide a good understanding of the key concepts and algorithms in RL, as well as the theoretical foundations and practical applications of these methods. The author also acknowledges the contributions of various individuals who helped in the development and review of the book.