26 Jun 2024 | Johan Obando-Ceron*, Ghada Sokar*, Timon Willi*, Clare Lyle, Jesse Farebrother, Jakob Foerster, Karolina Dziugaite, Doina Precup, Pablo Samuel Castro
This paper explores the impact of incorporating Mixture-of-Expert (MoE) modules, particularly Soft MoEs, into value-based deep reinforcement learning (RL) networks. The authors demonstrate that using Soft MoEs significantly improves the performance of various deep RL agents, and this improvement scales with the number of experts used. The study focuses on DQN and Rainbow, two widely used value-based agents, and evaluates their performance on the Arcade Learning Environment (ALE). The results show that while increasing the parameter count of the baseline models often hurts performance, Soft MoEs enable substantial performance gains, especially as the number of experts increases. The paper also investigates different design choices, such as tokenization methods, encoder architectures, and game selection, and provides insights into the underlying mechanisms that contribute to the improved performance. The findings suggest that Soft MoEs can enhance the scalability and efficiency of deep RL networks, making them more robust to parameter scaling. The authors conclude by discussing future directions, including the exploration of different architectural designs and the potential benefits of MoEs in offline RL and low-data training regimes.This paper explores the impact of incorporating Mixture-of-Expert (MoE) modules, particularly Soft MoEs, into value-based deep reinforcement learning (RL) networks. The authors demonstrate that using Soft MoEs significantly improves the performance of various deep RL agents, and this improvement scales with the number of experts used. The study focuses on DQN and Rainbow, two widely used value-based agents, and evaluates their performance on the Arcade Learning Environment (ALE). The results show that while increasing the parameter count of the baseline models often hurts performance, Soft MoEs enable substantial performance gains, especially as the number of experts increases. The paper also investigates different design choices, such as tokenization methods, encoder architectures, and game selection, and provides insights into the underlying mechanisms that contribute to the improved performance. The findings suggest that Soft MoEs can enhance the scalability and efficiency of deep RL networks, making them more robust to parameter scaling. The authors conclude by discussing future directions, including the exploration of different architectural designs and the potential benefits of MoEs in offline RL and low-data training regimes.