27 Mar 2021 | Tao Lin*, Lingjing Kong*, Sebastian U. Stich, Martin Jaggi.
This paper introduces ensemble distillation for robust model fusion in federated learning (FL). Traditional FL methods, such as federated averaging (FedAvg), require all client models to have the same structure and size, which can be restrictive in many practical scenarios. To address this issue, the authors propose ensemble distillation, a technique that trains the central classifier using unlabeled data from the outputs of client models. This approach allows for flexible aggregation of heterogeneous client models, differing in size, numerical precision, or structure, while mitigating privacy risks and costs. Extensive empirical experiments on various CV/NLP datasets (CIFAR-10/100, ImageNet, AG News, SST2) demonstrate that the server model can be trained much faster with fewer communication rounds compared to existing FL techniques. The paper also discusses the limitations of FedAVG and provides insights into when FedDF outperforms FEDAVG, highlighting the importance of model initialization and the impact of different optimization schemes.This paper introduces ensemble distillation for robust model fusion in federated learning (FL). Traditional FL methods, such as federated averaging (FedAvg), require all client models to have the same structure and size, which can be restrictive in many practical scenarios. To address this issue, the authors propose ensemble distillation, a technique that trains the central classifier using unlabeled data from the outputs of client models. This approach allows for flexible aggregation of heterogeneous client models, differing in size, numerical precision, or structure, while mitigating privacy risks and costs. Extensive empirical experiments on various CV/NLP datasets (CIFAR-10/100, ImageNet, AG News, SST2) demonstrate that the server model can be trained much faster with fewer communication rounds compared to existing FL techniques. The paper also discusses the limitations of FedAVG and provides insights into when FedDF outperforms FEDAVG, highlighting the importance of model initialization and the impact of different optimization schemes.