Federated Learning for Decentralized Artificial Intelligence in Melanoma Diagnostics

Federated Learning for Decentralized Artificial Intelligence in Melanoma Diagnostics

March 2024 | Sarah Hagenmüller, MSc; Max Schmitt, MSc; Eva Krieghoff-Henning, PhD; Achim Hekler, MSc; Roman C. Maron, MSc; Christoph Wies, MSc; Jochen S. Utikal, MD; Friedegund Meier, MD; Sarah Hobelsberger, MD; Frank F. Gellrich, MD; Mildred Sergon, MD; Axel Hauschild, MD; Lars E. French, MD; Lucie Heinzerling, MD; Justin G. Schlager, MD; Kamran Ghoreschi, MD; Max Schlaak, MD; Franz J. Hilke, PhD; Gabriela Poch, MD; Sören Korsing, MD; Carola Berking, MD; Markus V. Heppt, MD; Michael Erdmann, MD; Sebastian Haferkamp, MD; Konstantin Drexl, MD; Dirk Schadendorf, MD; Wiebke Sondermann, MD; Matthias Goebeler, MD; Bastian Schilling, MD; Jakob N. Kather, MD; Stefan Fröhling, MD; Titus J. Brinker, MD
This study evaluates the diagnostic performance of federated learning (FL) for melanoma-nevus classification using histopathological whole-slide images (WSIs) from six German university hospitals. The goal was to determine whether FL, a privacy-preserving approach, could achieve comparable diagnostic performance to classical centralized and ensemble learning methods. The study included 1025 WSIs from 923 patients, with 388 invasive melanomas and 637 nevi. The FL model was trained on data from five hospitals and tested on a holdout dataset (from the same hospitals) and an external dataset (from a sixth hospital). The results showed that the classical centralized model outperformed the FL model on the holdout test dataset (AUROC 0.9024 vs 0.8579), but the FL model performed better than the centralized model on the external test dataset (AUROC 0.9126 vs 0.9045). The ensemble approach performed best on both datasets, with AUROCs of 0.8867 and 0.9227, respectively. The FL model demonstrated significant performance differences compared to the centralized model on the holdout dataset but showed better performance on the external dataset. The study highlights that FL can achieve at least comparable performance to centralized learning while promoting collaboration across institutions and countries. However, FL may not be as effective for in-distribution classification tasks, as seen in the holdout test dataset. The findings suggest that FL has potential for generalizability, particularly in out-of-distribution scenarios. The study also notes that FL may be extended to other image classification tasks in digital cancer histopathology. Despite these findings, the ensemble approach remains superior in terms of AUROC. The study underscores the importance of data diversity and the need for further research to explore the effectiveness of FL in melanoma diagnostics using prospective data.This study evaluates the diagnostic performance of federated learning (FL) for melanoma-nevus classification using histopathological whole-slide images (WSIs) from six German university hospitals. The goal was to determine whether FL, a privacy-preserving approach, could achieve comparable diagnostic performance to classical centralized and ensemble learning methods. The study included 1025 WSIs from 923 patients, with 388 invasive melanomas and 637 nevi. The FL model was trained on data from five hospitals and tested on a holdout dataset (from the same hospitals) and an external dataset (from a sixth hospital). The results showed that the classical centralized model outperformed the FL model on the holdout test dataset (AUROC 0.9024 vs 0.8579), but the FL model performed better than the centralized model on the external test dataset (AUROC 0.9126 vs 0.9045). The ensemble approach performed best on both datasets, with AUROCs of 0.8867 and 0.9227, respectively. The FL model demonstrated significant performance differences compared to the centralized model on the holdout dataset but showed better performance on the external dataset. The study highlights that FL can achieve at least comparable performance to centralized learning while promoting collaboration across institutions and countries. However, FL may not be as effective for in-distribution classification tasks, as seen in the holdout test dataset. The findings suggest that FL has potential for generalizability, particularly in out-of-distribution scenarios. The study also notes that FL may be extended to other image classification tasks in digital cancer histopathology. Despite these findings, the ensemble approach remains superior in terms of AUROC. The study underscores the importance of data diversity and the need for further research to explore the effectiveness of FL in melanoma diagnostics using prospective data.
Reach us at info@study.space
Understanding Federated Learning for Decentralized Artificial Intelligence in Melanoma Diagnostics