Self-Consuming Generative Models with Curated Data Provably Optimize Human Preferences

Self-Consuming Generative Models with Curated Data Provably Optimize Human Preferences

12 Jun 2024 | Damien Ferbach, Quentin Bertrand, Avisek Joey Bose, Gauthier Gidel
This paper investigates the impact of data curation on the training of self-consuming generative models. The authors show that when synthetic data is curated by users, it can be viewed as an implicit preference optimization mechanism. They theoretically demonstrate that the expected reward of the iterative retraining process is maximized when data is curated according to a reward model. Additionally, they show that the variance of the reward collapses and the process converges to the maximum reward regions. The study also considers the stability of the retraining loop when real data is injected at each step. Theoretical results show that the KL divergence between the model distribution and the optimal distribution remains bounded, and the expected reward increases. The authors conduct experiments on synthetic datasets and CIFAR10, showing that the procedure amplifies biases of the reward model. The paper highlights the importance of understanding how curation affects the training of generative models and the potential for bias amplification in self-consuming loops. The results have implications for the development of large-scale generative models and the ethical considerations of their deployment.This paper investigates the impact of data curation on the training of self-consuming generative models. The authors show that when synthetic data is curated by users, it can be viewed as an implicit preference optimization mechanism. They theoretically demonstrate that the expected reward of the iterative retraining process is maximized when data is curated according to a reward model. Additionally, they show that the variance of the reward collapses and the process converges to the maximum reward regions. The study also considers the stability of the retraining loop when real data is injected at each step. Theoretical results show that the KL divergence between the model distribution and the optimal distribution remains bounded, and the expected reward increases. The authors conduct experiments on synthetic datasets and CIFAR10, showing that the procedure amplifies biases of the reward model. The paper highlights the importance of understanding how curation affects the training of generative models and the potential for bias amplification in self-consuming loops. The results have implications for the development of large-scale generative models and the ethical considerations of their deployment.
Reach us at info@study.space
[slides] Self-Consuming Generative Models with Curated Data Provably Optimize Human Preferences | StudySpace