January 3, 2018 | Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, Timothy Lillicrap, Martin Riedmiller
The *DeepMind Control Suite* is a collection of continuous control tasks designed to serve as performance benchmarks for reinforcement learning agents. The tasks are standardized and use interpretable rewards, making them easy to use and modify. Written in Python and powered by the MuJoCo physics engine, the suite includes benchmarks for various learning algorithms. It is publicly available on GitHub and provides a video summary of all tasks. The suite focuses exclusively on continuous control, separating observations with similar units and offering a unified reward structure for interpretable learning curves and aggregated performance measures. It emphasizes high-quality, well-documented code and includes a wide range of domains, from simple to complex, such as pendulums, walkers, and manipulators. The suite also provides Python APIs for high and low-level interactions, benchmarking results, and future development plans. The paper discusses the structure and design of the suite, verification processes, and reinforcement learning concepts. It details the implementation of the suite, including the environment class, the suite module, and the MuJoCo Python interface. The paper presents benchmarking results for A3C, DDPG, and D4PG algorithms, comparing their performance across different tasks and data sources (state features and raw pixels). The suite aims to be a starting point for designing and comparing reinforcement learning algorithms for physics-based control, offering a wide range of tasks and robust performance measures. Future work includes adding more complex tasks, improving visualization tools, and supporting additional platforms.The *DeepMind Control Suite* is a collection of continuous control tasks designed to serve as performance benchmarks for reinforcement learning agents. The tasks are standardized and use interpretable rewards, making them easy to use and modify. Written in Python and powered by the MuJoCo physics engine, the suite includes benchmarks for various learning algorithms. It is publicly available on GitHub and provides a video summary of all tasks. The suite focuses exclusively on continuous control, separating observations with similar units and offering a unified reward structure for interpretable learning curves and aggregated performance measures. It emphasizes high-quality, well-documented code and includes a wide range of domains, from simple to complex, such as pendulums, walkers, and manipulators. The suite also provides Python APIs for high and low-level interactions, benchmarking results, and future development plans. The paper discusses the structure and design of the suite, verification processes, and reinforcement learning concepts. It details the implementation of the suite, including the environment class, the suite module, and the MuJoCo Python interface. The paper presents benchmarking results for A3C, DDPG, and D4PG algorithms, comparing their performance across different tasks and data sources (state features and raw pixels). The suite aims to be a starting point for designing and comparing reinforcement learning algorithms for physics-based control, offering a wide range of tasks and robust performance measures. Future work includes adding more complex tasks, improving visualization tools, and supporting additional platforms.