2024 | Nate Gillman, Michael Freeman, Daksh Aggarwal, Chia-Hong Hsu, Calvin Luo, Yonglong Tian, Chen Sun
This paper introduces a method for stabilizing self-consuming generative model training by incorporating self-correction functions. As synthetic data becomes more prevalent, training generative models on a mix of synthetic and real data can lead to instability or collapse in self-consuming loops. The authors propose a theoretical framework showing that introducing an idealized correction function, which maps data points to be more likely under the true data distribution, exponentially increases the stability of self-consuming loops. They also propose practical self-correction functions that use expert knowledge, such as physics simulators, to automatically and at scale correct synthetic data.
Empirical results on human motion synthesis show that self-correcting self-consuming loops avoid model collapse even when synthetic data makes up 100% of the training data. Theoretical analysis demonstrates that self-correction reduces variance and improves stability during training. The authors also show that self-correction can be applied to various tasks, including image generation and human motion synthesis, and that it leads to better model performance and stability.
The paper presents a theoretical analysis of self-correction in generative model training, showing that it leads to exponentially more stable training and smaller variance. The authors also provide empirical validation of their findings on both synthetic and real data. They conclude that self-correction is a promising approach for stabilizing self-consuming generative model training and that further research is needed to explore its application to other tasks.This paper introduces a method for stabilizing self-consuming generative model training by incorporating self-correction functions. As synthetic data becomes more prevalent, training generative models on a mix of synthetic and real data can lead to instability or collapse in self-consuming loops. The authors propose a theoretical framework showing that introducing an idealized correction function, which maps data points to be more likely under the true data distribution, exponentially increases the stability of self-consuming loops. They also propose practical self-correction functions that use expert knowledge, such as physics simulators, to automatically and at scale correct synthetic data.
Empirical results on human motion synthesis show that self-correcting self-consuming loops avoid model collapse even when synthetic data makes up 100% of the training data. Theoretical analysis demonstrates that self-correction reduces variance and improves stability during training. The authors also show that self-correction can be applied to various tasks, including image generation and human motion synthesis, and that it leads to better model performance and stability.
The paper presents a theoretical analysis of self-correction in generative model training, showing that it leads to exponentially more stable training and smaller variance. The authors also provide empirical validation of their findings on both synthetic and real data. They conclude that self-correction is a promising approach for stabilizing self-consuming generative model training and that further research is needed to explore its application to other tasks.