27 Mar 2021 | Tao Lin*, Lingjing Kong*, Sebastian U. Stich, Martin Jaggi.
This paper proposes a novel approach called ensemble distillation for robust model fusion in federated learning (FL). The goal is to improve the aggregation of models from heterogeneous clients, which is a key challenge in FL due to the diversity of client models and data. Traditional methods like federated averaging (FEDAVG) directly average model parameters, which is only feasible when all models have the same structure and size. In contrast, ensemble distillation leverages unlabeled data or artificially generated examples to train a central classifier on the outputs of client models, enabling flexible aggregation across diverse models.
The proposed method, FedDF, uses knowledge distillation to train the central model by distilling knowledge from multiple client models. This approach mitigates privacy risks and communication costs while allowing for flexible aggregation of heterogeneous models. Extensive experiments on various computer vision and natural language processing datasets (e.g., CIFAR-10, ImageNet, AG News, SST2) show that FedDF can train the server model faster and with fewer communication rounds compared to existing FL techniques.
FedDF is robust to different neural architectures and can handle heterogeneous client models that vary in size, numerical precision, or structure. It also demonstrates superior performance in scenarios with non-i.i.d. data distributions, where traditional methods often struggle. The method is particularly effective when using unlabeled data from other domains or synthetic data generated by a pre-trained generator, which helps overcome the limitations of real unlabeled datasets.
The paper also discusses the privacy-preserving aspects of FedDF, showing that it can be combined with existing protection mechanisms like differential privacy to enhance client security. Additionally, FedDF is shown to be effective in low-bit quantized models and heterogeneous systems with different neural architectures, demonstrating its versatility and robustness in various FL scenarios. Overall, FedDF provides a more efficient and robust solution for federated learning in scenarios with diverse client models and data distributions.This paper proposes a novel approach called ensemble distillation for robust model fusion in federated learning (FL). The goal is to improve the aggregation of models from heterogeneous clients, which is a key challenge in FL due to the diversity of client models and data. Traditional methods like federated averaging (FEDAVG) directly average model parameters, which is only feasible when all models have the same structure and size. In contrast, ensemble distillation leverages unlabeled data or artificially generated examples to train a central classifier on the outputs of client models, enabling flexible aggregation across diverse models.
The proposed method, FedDF, uses knowledge distillation to train the central model by distilling knowledge from multiple client models. This approach mitigates privacy risks and communication costs while allowing for flexible aggregation of heterogeneous models. Extensive experiments on various computer vision and natural language processing datasets (e.g., CIFAR-10, ImageNet, AG News, SST2) show that FedDF can train the server model faster and with fewer communication rounds compared to existing FL techniques.
FedDF is robust to different neural architectures and can handle heterogeneous client models that vary in size, numerical precision, or structure. It also demonstrates superior performance in scenarios with non-i.i.d. data distributions, where traditional methods often struggle. The method is particularly effective when using unlabeled data from other domains or synthetic data generated by a pre-trained generator, which helps overcome the limitations of real unlabeled datasets.
The paper also discusses the privacy-preserving aspects of FedDF, showing that it can be combined with existing protection mechanisms like differential privacy to enhance client security. Additionally, FedDF is shown to be effective in low-bit quantized models and heterogeneous systems with different neural architectures, demonstrating its versatility and robustness in various FL scenarios. Overall, FedDF provides a more efficient and robust solution for federated learning in scenarios with diverse client models and data distributions.