10 Jun 2024 | Nate Gillman, Michael Freeman, Daksh Aggarwal, Chia-Hong Hsu, Calvin Luo, Yonglong Tian, Chen Sun
The paper addresses the issue of training generative models using a mix of human- and machine-generated data, which can lead to "self-consuming loops" that cause training instability or collapse. The authors propose a method to stabilize these self-consuming loops by introducing a self-correction function that maps synthetic data to more likely samples under the true data distribution. The theoretical analysis demonstrates that this correction function can make the training process exponentially more stable and reduce unwanted variance. Empirical validation on the human motion synthesis task shows that the proposed method successfully avoids model collapse even when a high ratio of synthetic to real data is used. The paper also discusses the practical implementation of the self-correction function using a physics simulator and provides a detailed experimental setup and results.The paper addresses the issue of training generative models using a mix of human- and machine-generated data, which can lead to "self-consuming loops" that cause training instability or collapse. The authors propose a method to stabilize these self-consuming loops by introducing a self-correction function that maps synthetic data to more likely samples under the true data distribution. The theoretical analysis demonstrates that this correction function can make the training process exponentially more stable and reduce unwanted variance. Empirical validation on the human motion synthesis task shows that the proposed method successfully avoids model collapse even when a high ratio of synthetic to real data is used. The paper also discusses the practical implementation of the self-correction function using a physics simulator and provides a detailed experimental setup and results.