[slides] Craftax%3A A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning

Craftax is a fast, complex benchmark for open-ended reinforcement learning (RL). It is a JAX-based environment that significantly improves upon the original Crafter benchmark, running up to 250 times faster. Craftax-Classic is a reimplementation of Crafter in JAX, allowing for much faster RL training. It is solved by a simple PPO agent in under an hour on a single GPU, achieving 90% of the optimal reward. The main Craftax benchmark is a more challenging extension of Crafter, incorporating mechanics inspired by NetHack, offering a more complex and open-ended environment. Craftax presents a significant challenge for existing RL methods, as they fail to make meaningful progress on the benchmark. It requires deep exploration, long-term planning, memory, and the ability to adapt to novel situations. The benchmark is designed to test exploration, continual learning, and long-term reasoning, while allowing experimentation with limited computational resources. Craftax provides two benchmarks: Craftax-1B and Craftax-1M. The 1B benchmark allows for 1 billion environment interactions, testing algorithms on exploration, continual learning, and long-term planning. The 1M benchmark tests sample efficiency, allowing for rapid iteration of methods. Results show that existing methods, including global and episodic exploration, as well as unsupervised environment design, fail to make significant progress on Craftax. Craftax is designed to be a challenging benchmark for open-ended learning, with a balance of difficulty between Crafter and NetHack. It includes multiple floors, combat mechanics, new creatures, potions and enchantments, attributes, and a boss floor. The environment provides both pixel-based and symbolic observations, with symbolic observations being significantly faster to run. Craftax is implemented using JAX, allowing for efficient parallelization and compilation, making it suitable for large-scale experiments. It conforms to the Gymnax wrapper for easy integration with existing frameworks. The environment has a discrete action space and a reward structure based on achievements, with penalties for damage and rewards for recovery. The paper evaluates various exploration and unsupervised environment design methods on Craftax, finding that none of the tested methods significantly improve performance. The results show that existing RL methods struggle to solve Craftax, highlighting the need for new approaches to open-ended learning. The paper concludes that Craftax provides a meaningful challenge for future RL research, while allowing experimentation with limited computational resources.Craftax is a fast, complex benchmark for open-ended reinforcement learning (RL). It is a JAX-based environment that significantly improves upon the original Crafter benchmark, running up to 250 times faster. Craftax-Classic is a reimplementation of Crafter in JAX, allowing for much faster RL training. It is solved by a simple PPO agent in under an hour on a single GPU, achieving 90% of the optimal reward. The main Craftax benchmark is a more challenging extension of Crafter, incorporating mechanics inspired by NetHack, offering a more complex and open-ended environment. Craftax presents a significant challenge for existing RL methods, as they fail to make meaningful progress on the benchmark. It requires deep exploration, long-term planning, memory, and the ability to adapt to novel situations. The benchmark is designed to test exploration, continual learning, and long-term reasoning, while allowing experimentation with limited computational resources. Craftax provides two benchmarks: Craftax-1B and Craftax-1M. The 1B benchmark allows for 1 billion environment interactions, testing algorithms on exploration, continual learning, and long-term planning. The 1M benchmark tests sample efficiency, allowing for rapid iteration of methods. Results show that existing methods, including global and episodic exploration, as well as unsupervised environment design, fail to make significant progress on Craftax. Craftax is designed to be a challenging benchmark for open-ended learning, with a balance of difficulty between Crafter and NetHack. It includes multiple floors, combat mechanics, new creatures, potions and enchantments, attributes, and a boss floor. The environment provides both pixel-based and symbolic observations, with symbolic observations being significantly faster to run. Craftax is implemented using JAX, allowing for efficient parallelization and compilation, making it suitable for large-scale experiments. It conforms to the Gymnax wrapper for easy integration with existing frameworks. The environment has a discrete action space and a reward structure based on achievements, with penalties for damage and rewards for recovery. The paper evaluates various exploration and unsupervised environment design methods on Craftax, finding that none of the tested methods significantly improve performance. The results show that existing RL methods struggle to solve Craftax, highlighting the need for new approaches to open-ended learning. The paper concludes that Craftax provides a meaningful challenge for future RL research, while allowing experimentation with limited computational resources.

Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning

2024 | Michael Matthews, Michael Beukman, Benjamin Ellis, Mikayel Samvelyan, Matthew Jackson, Samuel Coward, Jakob Foerster