[slides and audio] Byzantine-Robust Distributed Learning%3A Towards Optimal Statistical Rates

This paper presents Byzantine-robust distributed learning algorithms that achieve optimal statistical performance in the presence of Byzantine failures. The authors develop two robust distributed gradient descent algorithms: one based on coordinate-wise median and the other on coordinate-wise trimmed mean. These algorithms are shown to achieve order-optimal statistical error rates for strongly convex losses. A median-based algorithm with only one communication round is also proposed, which achieves the same optimal error rate for strongly convex quadratic losses. The paper addresses the challenge of maintaining statistical performance in distributed learning systems where some computing units may behave abnormally or exhibit Byzantine failures. The authors analyze the statistical error rates for three types of population loss functions: strongly convex, non-strongly convex, and smooth non-convex. They show that the error rates depend on the fraction of Byzantine machines, the number of data points, and the dimensionality of the parameter space. The authors also consider communication efficiency, aiming to minimize the number of communication rounds and the amount of data communicated per round. They propose a setting where each worker or master machine can only communicate a vector of size O(d), where d is the dimension of the parameter to be learned. This setting allows for efficient communication while maintaining statistical performance. The paper provides theoretical guarantees for the performance of the proposed algorithms, showing that they achieve optimal statistical error rates under certain assumptions. The authors also conduct experiments to validate the effectiveness of the median and trimmed mean operations in defending against Byzantine failures. The results show that these operations significantly improve test accuracy in adversarial settings.This paper presents Byzantine-robust distributed learning algorithms that achieve optimal statistical performance in the presence of Byzantine failures. The authors develop two robust distributed gradient descent algorithms: one based on coordinate-wise median and the other on coordinate-wise trimmed mean. These algorithms are shown to achieve order-optimal statistical error rates for strongly convex losses. A median-based algorithm with only one communication round is also proposed, which achieves the same optimal error rate for strongly convex quadratic losses. The paper addresses the challenge of maintaining statistical performance in distributed learning systems where some computing units may behave abnormally or exhibit Byzantine failures. The authors analyze the statistical error rates for three types of population loss functions: strongly convex, non-strongly convex, and smooth non-convex. They show that the error rates depend on the fraction of Byzantine machines, the number of data points, and the dimensionality of the parameter space. The authors also consider communication efficiency, aiming to minimize the number of communication rounds and the amount of data communicated per round. They propose a setting where each worker or master machine can only communicate a vector of size O(d), where d is the dimension of the parameter to be learned. This setting allows for efficient communication while maintaining statistical performance. The paper provides theoretical guarantees for the performance of the proposed algorithms, showing that they achieve optimal statistical error rates under certain assumptions. The authors also conduct experiments to validate the effectiveness of the median and trimmed mean operations in defending against Byzantine failures. The results show that these operations significantly improve test accuracy in adversarial settings.

Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates

February 26, 2021 | Dong Yin *1, Yudong Chen 13, Kannan Ramchandran ‡1, and Peter Bartlett §1,2