Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification

Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification

13 Sep 2019 | Tzu-Ming Harry Hsu, Hang Qi, Matthew Brown
This paper investigates the impact of non-identical data distributions on federated visual classification. Federated Learning (FL) allows training models on decentralized data while preserving privacy. However, data distributions across devices may differ significantly, affecting model performance. The authors propose a method to synthesize datasets with varying degrees of identicalness and evaluate the performance of the Federated Averaging (FedAvg) algorithm. They show that performance degrades as data distributions become more non-identical and propose a mitigation strategy using server momentum. The experiments are conducted on the CIFAR-10 dataset, which contains 60,000 images from 10 classes. The authors generate synthetic non-identical client data using a Dirichlet distribution, varying the concentration parameter α to control the level of identicalness. They find that classification accuracy improves as the data becomes more non-identical, with accuracy increasing from 30.1% to 76.9% in the most skewed settings. The study also explores the effect of server momentum on FedAvg performance. They find that adding momentum to the server's weight updates improves performance, especially in non-identical data scenarios. The results show that FedAvgM (FedAvg with server momentum) achieves performance close to centralized learning in many cases. The authors also analyze the sensitivity of FedAvgM to hyperparameters, finding that the choice of learning rate and momentum parameter significantly affects performance. They suggest that for non-identical data, a lower effective learning rate and higher momentum are needed to prevent divergence. The study highlights the importance of considering data distribution characteristics when designing federated learning systems.This paper investigates the impact of non-identical data distributions on federated visual classification. Federated Learning (FL) allows training models on decentralized data while preserving privacy. However, data distributions across devices may differ significantly, affecting model performance. The authors propose a method to synthesize datasets with varying degrees of identicalness and evaluate the performance of the Federated Averaging (FedAvg) algorithm. They show that performance degrades as data distributions become more non-identical and propose a mitigation strategy using server momentum. The experiments are conducted on the CIFAR-10 dataset, which contains 60,000 images from 10 classes. The authors generate synthetic non-identical client data using a Dirichlet distribution, varying the concentration parameter α to control the level of identicalness. They find that classification accuracy improves as the data becomes more non-identical, with accuracy increasing from 30.1% to 76.9% in the most skewed settings. The study also explores the effect of server momentum on FedAvg performance. They find that adding momentum to the server's weight updates improves performance, especially in non-identical data scenarios. The results show that FedAvgM (FedAvg with server momentum) achieves performance close to centralized learning in many cases. The authors also analyze the sensitivity of FedAvgM to hyperparameters, finding that the choice of learning rate and momentum parameter significantly affects performance. They suggest that for non-identical data, a lower effective learning rate and higher momentum are needed to prevent divergence. The study highlights the importance of considering data distribution characteristics when designing federated learning systems.
Reach us at info@study.space
Understanding Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification