Understanding Stop Regressing%3A Training Value Functions via Classification for Scalable Deep RL

The paper "Stop Regressing: Training Value Functions via Classification for Scalable Deep RL" explores the use of classification losses, particularly categorical cross-entropy, to train value functions in deep reinforcement learning (RL). Traditional RL methods often rely on regression losses like mean squared error (MSE) to train value functions, which have struggled to scale effectively to large networks such as high-capacity Transformers. The authors investigate whether using classification losses can improve the scalability and performance of value-based RL methods. Key findings include: 1. **Performance Improvements**: Training value functions with categorical cross-entropy significantly enhances performance and scalability in various domains, including Atari games, robotic manipulation, chess, and language agents. 2. **Scalability**: The approach scales well to large networks, outperforming regression-based methods in terms of parameter count and performance. 3. **Robustness**: Categorical cross-entropy losses are more robust to noisy targets and non-stationary environments, which are common challenges in RL. 4. **Representation Learning**: The use of categorical representations and distributed targets leads to better learned representations that support downstream tasks. The paper also provides detailed experimental results and ablation studies to demonstrate the effectiveness of the proposed approach. The authors conclude that a simple shift to training value functions with categorical cross-entropy can yield substantial improvements in the scalability of deep RL, making it a promising direction for future research.The paper "Stop Regressing: Training Value Functions via Classification for Scalable Deep RL" explores the use of classification losses, particularly categorical cross-entropy, to train value functions in deep reinforcement learning (RL). Traditional RL methods often rely on regression losses like mean squared error (MSE) to train value functions, which have struggled to scale effectively to large networks such as high-capacity Transformers. The authors investigate whether using classification losses can improve the scalability and performance of value-based RL methods. Key findings include: 1. **Performance Improvements**: Training value functions with categorical cross-entropy significantly enhances performance and scalability in various domains, including Atari games, robotic manipulation, chess, and language agents. 2. **Scalability**: The approach scales well to large networks, outperforming regression-based methods in terms of parameter count and performance. 3. **Robustness**: Categorical cross-entropy losses are more robust to noisy targets and non-stationary environments, which are common challenges in RL. 4. **Representation Learning**: The use of categorical representations and distributed targets leads to better learned representations that support downstream tasks. The paper also provides detailed experimental results and ablation studies to demonstrate the effectiveness of the proposed approach. The authors conclude that a simple shift to training value functions with categorical cross-entropy can yield substantial improvements in the scalability of deep RL, making it a promising direction for future research.

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

6 Mar 2024 | Jesse Farebrother, Jordi Orbay, Quan Vuong, Adrien Ali Taïga, Yevgen Chebotar, Ted Xiao, Alex Irpan, Sergey Levine, Pablo Samuel Castro, Aleksandra Faust, Aviral Kumar, Rishabh Agarwal