1 Feb 2024 | André Bauer, Simon Trapp, Michael Stenger, Robert Leppich, Samuel Kounev, Mark Leznik, Kyle Chard, Ian Foster
This paper presents a comprehensive survey of 417 synthetic data generation (SDG) models over the past decade, providing an in-depth overview of model types, functionality, and improvements. The study identifies common attributes and classifies the models, revealing trends in performance and complexity. Neural network-based approaches dominate, with GANs and diffusion models leading in computer vision, while RNNs excel in sequential data generation. However, privacy-preserving data generation relies on models like Markov chains and advanced GANs. The survey highlights challenges such as the lack of standardized evaluation metrics and datasets, making comparisons difficult. It also emphasizes the need for future research to address training and computational costs. The work serves as a guide for selecting appropriate SDG models and identifies key areas for further exploration. The paper also provides a detailed overview of various generative models, including Gaussian Mixture Models, Kernel Density Estimators, Markov Chain Models, Bayesian Networks, Genetic Algorithms, Boltzmann Machines, and Autoencoders, along with their applications and improvements. The survey aims to provide a comprehensive understanding of the diverse landscape of SDG models and their potential for future research.This paper presents a comprehensive survey of 417 synthetic data generation (SDG) models over the past decade, providing an in-depth overview of model types, functionality, and improvements. The study identifies common attributes and classifies the models, revealing trends in performance and complexity. Neural network-based approaches dominate, with GANs and diffusion models leading in computer vision, while RNNs excel in sequential data generation. However, privacy-preserving data generation relies on models like Markov chains and advanced GANs. The survey highlights challenges such as the lack of standardized evaluation metrics and datasets, making comparisons difficult. It also emphasizes the need for future research to address training and computational costs. The work serves as a guide for selecting appropriate SDG models and identifies key areas for further exploration. The paper also provides a detailed overview of various generative models, including Gaussian Mixture Models, Kernel Density Estimators, Markov Chain Models, Bayesian Networks, Genetic Algorithms, Boltzmann Machines, and Autoencoders, along with their applications and improvements. The survey aims to provide a comprehensive understanding of the diverse landscape of SDG models and their potential for future research.