Federated learning enables edge devices to train shared models without sharing data, offering privacy and security benefits. However, when local data is non-IID, model accuracy drops significantly. This paper shows that accuracy decreases by up to 55% for neural networks trained on highly skewed non-IID data. The reduction is attributed to weight divergence, which can be measured by the Earth Mover's Distance (EMD) between local and population data distributions. To address this, the authors propose a data-sharing strategy where a small subset of globally shared data improves accuracy by up to 30% on CIFAR-10 with only 5% shared data. Experiments show that sharing data reduces EMD, leading to higher test accuracy. The paper also demonstrates that EMD is a good metric for estimating FedAvg accuracy with non-IID data. The proposed data-sharing strategy involves distributing a warm-up model trained on shared data to clients, which helps reduce EMD and improve accuracy. The strategy balances accuracy and centralization, with experiments showing that sharing 5% of data can increase accuracy by 30% for extreme non-IID data. The paper concludes that federated learning with non-IID data requires careful handling of data distribution to maintain model performance.Federated learning enables edge devices to train shared models without sharing data, offering privacy and security benefits. However, when local data is non-IID, model accuracy drops significantly. This paper shows that accuracy decreases by up to 55% for neural networks trained on highly skewed non-IID data. The reduction is attributed to weight divergence, which can be measured by the Earth Mover's Distance (EMD) between local and population data distributions. To address this, the authors propose a data-sharing strategy where a small subset of globally shared data improves accuracy by up to 30% on CIFAR-10 with only 5% shared data. Experiments show that sharing data reduces EMD, leading to higher test accuracy. The paper also demonstrates that EMD is a good metric for estimating FedAvg accuracy with non-IID data. The proposed data-sharing strategy involves distributing a warm-up model trained on shared data to clients, which helps reduce EMD and improve accuracy. The strategy balances accuracy and centralization, with experiments showing that sharing 5% of data can increase accuracy by 30% for extreme non-IID data. The paper concludes that federated learning with non-IID data requires careful handling of data distribution to maintain model performance.