2024 | A. Anil Sinaci, Mert Gencturk, Celia Alvarez-Romero, Gokce Banu Laleci Erturkmen, Alicia Martinez-Garcia, María José Escalona-Cuaresma, Carlos Luis Parra-Calderon
This paper introduces a privacy-preserving federated machine learning (ML) architecture built on FAIR health data, enabling collaborative model-building among health data owners without sharing their datasets. The architecture uses an agent-based system, where each organization hosts a Federated ML Agent within its secure network, connected to a cloud-based Federated ML Manager. The system was validated by five healthcare organizations, which transformed their datasets into Health Level 7 Fast Healthcare Interoperability Resources (FHIR) using a FAIRification workflow. The federated ML model was trained on these FAIR datasets, and the resulting model achieved an accuracy rate of 87% in predicting 30-day readmission risk for chronic obstructive pulmonary disease (COPD) patients.
The architecture includes a Federated ML Manager that orchestrates the federated ML process, a Federated ML Agent that enables health institutions to perform ML operations on FAIR health data, and a browser-based GUI for user interaction. The system allows data scientists to design and execute federated ML algorithms, including feature definition, dataset creation, algorithm selection, and model training. The model training process involves local model training, model validation, and global model aggregation. The proposed algorithm generates a global predictive model in two communication rounds by calculating weights for each local model based on their performance.
The study demonstrates the practical application of privacy-preserving federated ML among five distinct healthcare entities, highlighting the value of FAIR health data in machine learning when utilized in a federated manner that ensures privacy protection without sharing data. The solution effectively leverages FAIR datasets from multiple healthcare organizations for federated ML while safeguarding sensitive health datasets, meeting legislative privacy and security requirements. The system was deployed on top of FAIRified health data from five distinct healthcare and health research organizations across Europe, and an experimental evaluation was conducted in real-life settings. The results show that the proposed solution can successfully utilize FAIR datasets from multiple health organizations for ML processes while preserving privacy within a trusted environment. The methodology adheres to FAIR principles and uses the HL7 FHIR standard, creating new prospects for secondary use, such as federated ML. This approach assists healthcare and health research organizations in safely leveraging datasets from other entities to build more accurate models for classification problems.This paper introduces a privacy-preserving federated machine learning (ML) architecture built on FAIR health data, enabling collaborative model-building among health data owners without sharing their datasets. The architecture uses an agent-based system, where each organization hosts a Federated ML Agent within its secure network, connected to a cloud-based Federated ML Manager. The system was validated by five healthcare organizations, which transformed their datasets into Health Level 7 Fast Healthcare Interoperability Resources (FHIR) using a FAIRification workflow. The federated ML model was trained on these FAIR datasets, and the resulting model achieved an accuracy rate of 87% in predicting 30-day readmission risk for chronic obstructive pulmonary disease (COPD) patients.
The architecture includes a Federated ML Manager that orchestrates the federated ML process, a Federated ML Agent that enables health institutions to perform ML operations on FAIR health data, and a browser-based GUI for user interaction. The system allows data scientists to design and execute federated ML algorithms, including feature definition, dataset creation, algorithm selection, and model training. The model training process involves local model training, model validation, and global model aggregation. The proposed algorithm generates a global predictive model in two communication rounds by calculating weights for each local model based on their performance.
The study demonstrates the practical application of privacy-preserving federated ML among five distinct healthcare entities, highlighting the value of FAIR health data in machine learning when utilized in a federated manner that ensures privacy protection without sharing data. The solution effectively leverages FAIR datasets from multiple healthcare organizations for federated ML while safeguarding sensitive health datasets, meeting legislative privacy and security requirements. The system was deployed on top of FAIRified health data from five distinct healthcare and health research organizations across Europe, and an experimental evaluation was conducted in real-life settings. The results show that the proposed solution can successfully utilize FAIR datasets from multiple health organizations for ML processes while preserving privacy within a trusted environment. The methodology adheres to FAIR principles and uses the HL7 FHIR standard, creating new prospects for secondary use, such as federated ML. This approach assists healthcare and health research organizations in safely leveraging datasets from other entities to build more accurate models for classification problems.