COMPREHENSIVE EXPLORATION OF SYNTHETIC DATA GENERATION: A SURVEY

COMPREHENSIVE EXPLORATION OF SYNTHETIC DATA GENERATION: A SURVEY

1 Feb 2024 | André Bauer, Simon Trapp, Michael Stenger, Robert Leppich, Samuel Kounev, Mark Leznik, Kyle Chard, Ian Foster
This paper provides a comprehensive survey of Synthetic Data Generation (SDG) models, addressing the challenges posed by limited training data in Machine Learning (ML) and Deep Learning (DL). The authors analyze 417 SDG models over the last decade, categorizing them into 20 distinct types and 42 subtypes. They identify common attributes, trends, and advancements, highlighting the shift from simpler probabilistic models to more complex neural network-based approaches. Computer vision is the most prominent application, with Generative Adversarial Networks (GANs) and diffusion models being the top-performing generative models. The paper also discusses the dominance of GANs in privacy-preserving data generation and the use of Recurrent Neural Networks (RNNs) for sequential data. Challenges such as the lack of standardized evaluation metrics and datasets are addressed, emphasizing the need for future research to address these issues. The work serves as a valuable resource for researchers and practitioners, providing a detailed guide for selecting appropriate SDG models based on specific tasks and requirements.This paper provides a comprehensive survey of Synthetic Data Generation (SDG) models, addressing the challenges posed by limited training data in Machine Learning (ML) and Deep Learning (DL). The authors analyze 417 SDG models over the last decade, categorizing them into 20 distinct types and 42 subtypes. They identify common attributes, trends, and advancements, highlighting the shift from simpler probabilistic models to more complex neural network-based approaches. Computer vision is the most prominent application, with Generative Adversarial Networks (GANs) and diffusion models being the top-performing generative models. The paper also discusses the dominance of GANs in privacy-preserving data generation and the use of Recurrent Neural Networks (RNNs) for sequential data. Challenges such as the lack of standardized evaluation metrics and datasets are addressed, emphasizing the need for future research to address these issues. The work serves as a valuable resource for researchers and practitioners, providing a detailed guide for selecting appropriate SDG models based on specific tasks and requirements.
Reach us at info@study.space
[slides] Comprehensive Exploration of Synthetic Data Generation%3A A Survey | StudySpace