28 Oct 2019 | Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, Kalyan Veeramachaneni
The paper "Modeling Tabular Data using Conditional GAN" addresses the challenge of generating realistic synthetic tabular data, which is difficult due to the mix of discrete and continuous columns, multi-modal distributions, and imbalanced categorical data. The authors propose CTGAN (Conditional Tabular GAN), a method that uses a conditional generator to handle these challenges. CTGAN introduces several techniques, including mode-specific normalization, a conditional generator, and training-by-sampling. The paper also presents a comprehensive benchmarking system with 7 simulated and 8 real datasets, comparing CTGAN with Bayesian network baselines and other deep learning methods. CTGAN outperforms these methods on most real datasets, demonstrating its effectiveness in modeling complex tabular data distributions. The paper concludes by discussing the unique properties of tabular data and the advantages of using GANs over traditional statistical models.The paper "Modeling Tabular Data using Conditional GAN" addresses the challenge of generating realistic synthetic tabular data, which is difficult due to the mix of discrete and continuous columns, multi-modal distributions, and imbalanced categorical data. The authors propose CTGAN (Conditional Tabular GAN), a method that uses a conditional generator to handle these challenges. CTGAN introduces several techniques, including mode-specific normalization, a conditional generator, and training-by-sampling. The paper also presents a comprehensive benchmarking system with 7 simulated and 8 real datasets, comparing CTGAN with Bayesian network baselines and other deep learning methods. CTGAN outperforms these methods on most real datasets, demonstrating its effectiveness in modeling complex tabular data distributions. The paper concludes by discussing the unique properties of tabular data and the advantages of using GANs over traditional statistical models.