This paper explores the challenges and solutions for federated learning when local data is non-iid (non-iid). Federated learning allows edge devices to collaboratively train a shared model while keeping the training data locally on the devices, offering benefits in privacy, security, and regulatory compliance. The study focuses on the statistical challenge of non-iid data, where the accuracy of federated learning can significantly decrease, up to 55% for neural networks trained on highly skewed non-iid data. This reduction is attributed to weight divergence, which can be quantified by the earth mover's distance (EMD) between the distribution over classes on each device and the population distribution. To address this, the paper proposes a strategy to improve training on non-iid data by creating a small subset of globally shared data among all edge devices. Experiments show that this approach can increase accuracy by about 30% for the CIFAR-10 dataset with only 5% globally shared data. The paper also provides a detailed mathematical analysis and experimental validation to support the proposed solution.This paper explores the challenges and solutions for federated learning when local data is non-iid (non-iid). Federated learning allows edge devices to collaboratively train a shared model while keeping the training data locally on the devices, offering benefits in privacy, security, and regulatory compliance. The study focuses on the statistical challenge of non-iid data, where the accuracy of federated learning can significantly decrease, up to 55% for neural networks trained on highly skewed non-iid data. This reduction is attributed to weight divergence, which can be quantified by the earth mover's distance (EMD) between the distribution over classes on each device and the population distribution. To address this, the paper proposes a strategy to improve training on non-iid data by creating a small subset of globally shared data among all edge devices. Experiments show that this approach can increase accuracy by about 30% for the CIFAR-10 dataset with only 5% globally shared data. The paper also provides a detailed mathematical analysis and experimental validation to support the proposed solution.