29 April 2024 | Haiao Li¹,² · Lina Ge¹,²,³ · Lei Tian¹,²
This survey explores the data security and privacy-preserving aspects of federated learning (FL) in edge-Internet of Things (edge-IoT). With the rapid development of Smart Internet of Things (SIoT), the amount of data generated is increasing exponentially, making traditional machine learning inadequate for training complex models. FL, as a new paradigm for training statistical models in distributed edge networks, addresses integration and training problems in the context of massive and heterogeneous data and enhances security for private data. Edge computing (EC) processes data at the edge layers of data sources, ensuring low-data-delay processing, providing high-bandwidth communication, and a stable network environment, thus reducing the pressure of processing massive data using a single node in the cloud center. The combination of EC and FL can further optimize computing, communication, and data security for edge-IoT. This review investigates the development status of FL, its basic principles, and addresses security attacks and privacy leakage problems in edge-IoT. It explores relevant work from cryptographic technologies (e.g., secure multi-party computation, homomorphic encryption, secret sharing), perturbation schemes (e.g., differential privacy), and adversarial training. Finally, challenges and future research directions for the integration of EC and FL are discussed. FL is a paradigm for training statistical learning models on distributed edge networks, proposed by Google in 2016. It is a mainstream solution for solving problems related to huge communication overheads, data privacy security, and heterogeneous data fusion. FL enables global model training without local source data. Each local device encrypts its local model parameter updates and uploads them to a central aggregation server, which then uses a federated average algorithm to obtain a global model parameter update. Each local device can then download and decrypt the global model parameters and use them for local ML model training. FL has been successfully applied in medical, industrial, and agricultural fields. It enables models to be trained on edge devices, making EC an appropriate environment for using FL. FL alleviates problems such as high communication costs, privacy and data security needs, and heterogeneous data isolation in edge-IoT networks. However, processes such as uploading and downloading model update parameters, training iterations, and other processes still expose the FL environment to potential risks.This survey explores the data security and privacy-preserving aspects of federated learning (FL) in edge-Internet of Things (edge-IoT). With the rapid development of Smart Internet of Things (SIoT), the amount of data generated is increasing exponentially, making traditional machine learning inadequate for training complex models. FL, as a new paradigm for training statistical models in distributed edge networks, addresses integration and training problems in the context of massive and heterogeneous data and enhances security for private data. Edge computing (EC) processes data at the edge layers of data sources, ensuring low-data-delay processing, providing high-bandwidth communication, and a stable network environment, thus reducing the pressure of processing massive data using a single node in the cloud center. The combination of EC and FL can further optimize computing, communication, and data security for edge-IoT. This review investigates the development status of FL, its basic principles, and addresses security attacks and privacy leakage problems in edge-IoT. It explores relevant work from cryptographic technologies (e.g., secure multi-party computation, homomorphic encryption, secret sharing), perturbation schemes (e.g., differential privacy), and adversarial training. Finally, challenges and future research directions for the integration of EC and FL are discussed. FL is a paradigm for training statistical learning models on distributed edge networks, proposed by Google in 2016. It is a mainstream solution for solving problems related to huge communication overheads, data privacy security, and heterogeneous data fusion. FL enables global model training without local source data. Each local device encrypts its local model parameter updates and uploads them to a central aggregation server, which then uses a federated average algorithm to obtain a global model parameter update. Each local device can then download and decrypt the global model parameters and use them for local ML model training. FL has been successfully applied in medical, industrial, and agricultural fields. It enables models to be trained on edge devices, making EC an appropriate environment for using FL. FL alleviates problems such as high communication costs, privacy and data security needs, and heterogeneous data isolation in edge-IoT networks. However, processes such as uploading and downloading model update parameters, training iterations, and other processes still expose the FL environment to potential risks.