2024 | Vasileios C. Pezoulas, Dimitrios I. Zaridis, Eugenia Mylona, Christos Androutsos, Kosmas Apostolidis, Nikolaos S. Tachos, Dimitrios I. Fotiadis
This review explores the application and efficacy of synthetic data methods in healthcare, focusing on tabular, imaging, radiomics, time-series, and omics data. The study systematically searched PubMed and Scopus databases, categorizing methods into statistical, probabilistic, machine learning, and deep learning. Deep learning-based generators were used in 72.6% of studies, with Python being the most popular implementation language (75.3%). Synthetic data is crucial for reducing costs and time in clinical trials, enhancing AI model performance in personalized medicine, ensuring fair treatment recommendations, and accessing high-quality, representative datasets without exposing sensitive patient information. The review highlights the importance of maintaining data fidelity and privacy, with deep learning methods being particularly effective in generating realistic synthetic data. Open-source repositories are provided to accelerate research in this field.This review explores the application and efficacy of synthetic data methods in healthcare, focusing on tabular, imaging, radiomics, time-series, and omics data. The study systematically searched PubMed and Scopus databases, categorizing methods into statistical, probabilistic, machine learning, and deep learning. Deep learning-based generators were used in 72.6% of studies, with Python being the most popular implementation language (75.3%). Synthetic data is crucial for reducing costs and time in clinical trials, enhancing AI model performance in personalized medicine, ensuring fair treatment recommendations, and accessing high-quality, representative datasets without exposing sensitive patient information. The review highlights the importance of maintaining data fidelity and privacy, with deep learning methods being particularly effective in generating realistic synthetic data. Open-source repositories are provided to accelerate research in this field.