Understanding Would Deep Generative Models Amplify Bias in Future Models%3F

The paper investigates the impact of deep generative models, specifically Stable Diffusion, on potential social biases in future computer vision models. It explores whether using generated images as training data would lead to a detrimental feedback loop of bias amplification. The study conducts simulations by gradually replacing original images in the COCO and CC3M datasets with images generated through Stable Diffusion. These modified datasets are then used to train OpenCLIP and image captioning models, which are evaluated for quality and bias. Key findings indicate that introducing generated images during training does not consistently amplify bias. Instead, instances of bias mitigation across specific tasks are observed. The paper explores factors influencing these phenomena, such as artifacts in image generation (e.g., blurry faces) and pre-existing biases in the original datasets. The study concludes with recommendations for handling biased generated images in future model training, emphasizing the importance of bias-filtering preprocessing during data collection. Overall, the research highlights the complex dynamics between deep generative models and existing datasets, providing valuable insights into the potential consequences of using synthetic data in AI development.The paper investigates the impact of deep generative models, specifically Stable Diffusion, on potential social biases in future computer vision models. It explores whether using generated images as training data would lead to a detrimental feedback loop of bias amplification. The study conducts simulations by gradually replacing original images in the COCO and CC3M datasets with images generated through Stable Diffusion. These modified datasets are then used to train OpenCLIP and image captioning models, which are evaluated for quality and bias. Key findings indicate that introducing generated images during training does not consistently amplify bias. Instead, instances of bias mitigation across specific tasks are observed. The paper explores factors influencing these phenomena, such as artifacts in image generation (e.g., blurry faces) and pre-existing biases in the original datasets. The study concludes with recommendations for handling biased generated images in future model training, emphasizing the importance of bias-filtering preprocessing during data collection. Overall, the research highlights the complex dynamics between deep generative models and existing datasets, providing valuable insights into the potential consequences of using synthetic data in AI development.

Would Deep Generative Models Amplify Bias in Future Models?

4 Apr 2024 | Tianwei Chen1*, Yusuke Hirota1, Mayu Otani2, Noa Garcia1, Yuta Nakashima1