8 Sep 2021 | Sashank J. Reddi; Zachary Charles; Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konečný, Sanjiv Kumar, H. Brendan McMahan
This paper addresses the challenges of federated learning (FL), a distributed machine learning paradigm where multiple clients collaborate with a central server to learn a model without sharing their raw data. Standard FL methods, such as Federated Averaging (FEDAVG), often suffer from tuning difficulties and suboptimal convergence behavior, especially in the presence of heterogeneous data. To address these issues, the authors propose federated versions of adaptive optimization methods, including ADAGRAD, ADAM, and YOGI, and analyze their convergence in nonconvex settings with heterogeneous data. The paper highlights the interplay between client heterogeneity and communication efficiency, and provides extensive empirical evaluations on various tasks, demonstrating that adaptive optimizers can significantly improve the performance of FL. The main contributions include a general framework for federated optimization using server and client optimizers, novel adaptive federated optimization methods, and comprehensive benchmarks for comparing federated optimization algorithms. The theoretical analysis and empirical results show that adaptive optimizers can lead to faster convergence and better performance, particularly in cross-device settings.This paper addresses the challenges of federated learning (FL), a distributed machine learning paradigm where multiple clients collaborate with a central server to learn a model without sharing their raw data. Standard FL methods, such as Federated Averaging (FEDAVG), often suffer from tuning difficulties and suboptimal convergence behavior, especially in the presence of heterogeneous data. To address these issues, the authors propose federated versions of adaptive optimization methods, including ADAGRAD, ADAM, and YOGI, and analyze their convergence in nonconvex settings with heterogeneous data. The paper highlights the interplay between client heterogeneity and communication efficiency, and provides extensive empirical evaluations on various tasks, demonstrating that adaptive optimizers can significantly improve the performance of FL. The main contributions include a general framework for federated optimization using server and client optimizers, novel adaptive federated optimization methods, and comprehensive benchmarks for comparing federated optimization algorithms. The theoretical analysis and empirical results show that adaptive optimizers can lead to faster convergence and better performance, particularly in cross-device settings.