6 Mar 2024 | Jesse Farebrother, Jordi Orbay, Quan Vuong, Adrien Ali Taïga, Yevgen Chebotar, Ted Xiao, Alex Irfan, Sergey Levine, Pablo Samuel Castro, Aleksandra Faust, Aviral Kumar, Rishabh Agarwal
This paper investigates whether using classification loss, specifically categorical cross-entropy, can improve the scalability and performance of value-based deep reinforcement learning (RL). The authors demonstrate that training value functions with categorical cross-entropy significantly improves performance and scalability across various domains, including single-task RL on Atari 2600 games with SoftMoEs, multi-task RL on Atari with large-scale ResNets, robotic manipulation with Q-transformers, playing Chess without search, and a language-agent Wordle task with high-capacity Transformers. They show that categorical cross-entropy mitigates issues inherent to value-based RL, such as noisy targets and non-stationarity, and that this approach leads to substantial improvements in scalability and performance.
The paper also explores the effectiveness of different methods for constructing categorical distributions from scalar targets, including Two-Hot, Histogram Losses (HL-Gauss), and Categorical Distributional RL (CDRL). The HL-Gauss method, which distributes probability mass to neighboring locations, is shown to outperform other methods in terms of performance and stability. The authors further analyze the benefits of categorical cross-entropy, finding that it provides robustness to noisy targets and allows the network to better use its capacity to fit non-stationary targets.
The study evaluates the effectiveness of classification losses in both online and offline RL settings, showing that HL-Gauss consistently outperforms mean-squared error (MSE) regression loss in various tasks. The results indicate that categorical cross-entropy is a promising approach for improving the scalability and performance of value-based deep RL, particularly in large-scale networks. The paper concludes that treating regression as classification in deep RL can yield substantial improvements in scalability and performance, highlighting the potential of categorical cross-entropy as a key component in the development of more effective deep RL algorithms.This paper investigates whether using classification loss, specifically categorical cross-entropy, can improve the scalability and performance of value-based deep reinforcement learning (RL). The authors demonstrate that training value functions with categorical cross-entropy significantly improves performance and scalability across various domains, including single-task RL on Atari 2600 games with SoftMoEs, multi-task RL on Atari with large-scale ResNets, robotic manipulation with Q-transformers, playing Chess without search, and a language-agent Wordle task with high-capacity Transformers. They show that categorical cross-entropy mitigates issues inherent to value-based RL, such as noisy targets and non-stationarity, and that this approach leads to substantial improvements in scalability and performance.
The paper also explores the effectiveness of different methods for constructing categorical distributions from scalar targets, including Two-Hot, Histogram Losses (HL-Gauss), and Categorical Distributional RL (CDRL). The HL-Gauss method, which distributes probability mass to neighboring locations, is shown to outperform other methods in terms of performance and stability. The authors further analyze the benefits of categorical cross-entropy, finding that it provides robustness to noisy targets and allows the network to better use its capacity to fit non-stationary targets.
The study evaluates the effectiveness of classification losses in both online and offline RL settings, showing that HL-Gauss consistently outperforms mean-squared error (MSE) regression loss in various tasks. The results indicate that categorical cross-entropy is a promising approach for improving the scalability and performance of value-based deep RL, particularly in large-scale networks. The paper concludes that treating regression as classification in deep RL can yield substantial improvements in scalability and performance, highlighting the potential of categorical cross-entropy as a key component in the development of more effective deep RL algorithms.