2024 | Johan Obando-Ceron, Ghada Sokar, Timon Willi, Clare Lyle, Jesse Farebrother, Jakob Foerster, Karolina Dziugaite, Doina Precup, Pablo Samuel Castro
This paper explores the use of Mixture-of-Experts (MoE) modules in deep reinforcement learning (RL) to improve parameter scalability. The authors demonstrate that incorporating Soft MoEs, a differentiable variant of MoEs, into value-based RL networks significantly enhances performance across various training regimes and model sizes. This finding provides strong empirical evidence for the development of scaling laws in RL. The study shows that Soft MoEs outperform traditional MoEs (e.g., Top1-MoE) in terms of parameter scalability and performance, especially when the number of experts increases. The results indicate that Soft MoEs maintain performance gains even at high replay ratios, suggesting they make RL networks more parameter-efficient. The paper also investigates the impact of different design choices, such as the number of experts, tokenization methods, and encoder architectures, on the performance of MoE-based RL models. The findings suggest that MoEs induce structured sparsity in networks, which contributes to improved performance and stability in training. The study further shows that MoEs can be beneficial in offline RL and low-data training regimes, and that they can help in reducing the number of dormant neurons in neural networks. The authors conclude that MoEs have the potential to improve the scalability and efficiency of deep RL networks, and that further research into alternative architectures and training methods could lead to more robust and effective RL systems.This paper explores the use of Mixture-of-Experts (MoE) modules in deep reinforcement learning (RL) to improve parameter scalability. The authors demonstrate that incorporating Soft MoEs, a differentiable variant of MoEs, into value-based RL networks significantly enhances performance across various training regimes and model sizes. This finding provides strong empirical evidence for the development of scaling laws in RL. The study shows that Soft MoEs outperform traditional MoEs (e.g., Top1-MoE) in terms of parameter scalability and performance, especially when the number of experts increases. The results indicate that Soft MoEs maintain performance gains even at high replay ratios, suggesting they make RL networks more parameter-efficient. The paper also investigates the impact of different design choices, such as the number of experts, tokenization methods, and encoder architectures, on the performance of MoE-based RL models. The findings suggest that MoEs induce structured sparsity in networks, which contributes to improved performance and stability in training. The study further shows that MoEs can be beneficial in offline RL and low-data training regimes, and that they can help in reducing the number of dormant neurons in neural networks. The authors conclude that MoEs have the potential to improve the scalability and efficiency of deep RL networks, and that further research into alternative architectures and training methods could lead to more robust and effective RL systems.