14 Apr 2024 | Chenyu Zhang, Han Wang, Aritra Mitra, James Anderson
The paper introduces FedSARSA, a novel on-policy federated reinforcement learning (FRL) algorithm designed to address the challenges of environmental heterogeneity and non-asymptotic performance in FRL. The authors tackle the complexities of Markovian sampling, linear function approximation, multiple local updates, and continuous state-action spaces, which are common in FRL. FedSARSA leverages the collaborative nature of FRL to enable linear speedups in convergence rates and improved exploration capabilities. The paper provides a comprehensive finite-time error analysis, establishing that FedSARSA converges to a near-optimal policy for all agents, with the degree of near-optimality proportional to the level of environmental heterogeneity. Additionally, it demonstrates that FedSARSA achieves linear speedups in the presence of multiple agents, both with fixed and adaptive step-size configurations. The theoretical results are validated through numerical simulations, showing the robustness of FedSARSA under varying levels of environmental heterogeneity.The paper introduces FedSARSA, a novel on-policy federated reinforcement learning (FRL) algorithm designed to address the challenges of environmental heterogeneity and non-asymptotic performance in FRL. The authors tackle the complexities of Markovian sampling, linear function approximation, multiple local updates, and continuous state-action spaces, which are common in FRL. FedSARSA leverages the collaborative nature of FRL to enable linear speedups in convergence rates and improved exploration capabilities. The paper provides a comprehensive finite-time error analysis, establishing that FedSARSA converges to a near-optimal policy for all agents, with the degree of near-optimality proportional to the level of environmental heterogeneity. Additionally, it demonstrates that FedSARSA achieves linear speedups in the presence of multiple agents, both with fixed and adaptive step-size configurations. The theoretical results are validated through numerical simulations, showing the robustness of FedSARSA under varying levels of environmental heterogeneity.