Fairness Feedback Loops: Training on Synthetic Data Amplifies Bias

Fairness Feedback Loops: Training on Synthetic Data Amplifies Bias

February 5, 2024 | Sierra Wyllie * 1, Ilia Shumailov2, and Nicolas Papernot1
The article discusses the concept of model-induced distribution shifts (MIDS), which occur when previous model outputs influence subsequent model training, leading to biased and unfair data ecosystems. This phenomenon is particularly evident in generative models (model collapse) and supervised models (performative prediction or unfairness feedback loops). The authors highlight how MIDS can amplify existing biases, reduce performance, and negatively impact the representation of minoritized groups, even in initially unbiased datasets. They introduce a framework to track these shifts across multiple generations of models and demonstrate how they can lead to significant fairness and representational issues. The study also explores the potential for algorithmic reparation (AR), a framework that uses machine learning to address historical discrimination by intentionally introducing fairness-focused interventions. The authors simulate AR through a method called STarified AR (STAR), which uses intersectional sampling to create more representative training data. Their experiments show that STAR can mitigate the negative effects of MIDS, improving fairness and reducing disparities in model outputs. The research emphasizes the importance of understanding and addressing MIDS to ensure that machine learning systems do not perpetuate or amplify existing biases. The authors present empirical results across various datasets, including CelebA and FairFace, showing the detrimental effects of MIDS and the effectiveness of AR interventions. They conclude that while MIDS can lead to significant fairness issues, intentional interventions like AR offer a promising approach to promoting equity and justice in data ecosystems.The article discusses the concept of model-induced distribution shifts (MIDS), which occur when previous model outputs influence subsequent model training, leading to biased and unfair data ecosystems. This phenomenon is particularly evident in generative models (model collapse) and supervised models (performative prediction or unfairness feedback loops). The authors highlight how MIDS can amplify existing biases, reduce performance, and negatively impact the representation of minoritized groups, even in initially unbiased datasets. They introduce a framework to track these shifts across multiple generations of models and demonstrate how they can lead to significant fairness and representational issues. The study also explores the potential for algorithmic reparation (AR), a framework that uses machine learning to address historical discrimination by intentionally introducing fairness-focused interventions. The authors simulate AR through a method called STarified AR (STAR), which uses intersectional sampling to create more representative training data. Their experiments show that STAR can mitigate the negative effects of MIDS, improving fairness and reducing disparities in model outputs. The research emphasizes the importance of understanding and addressing MIDS to ensure that machine learning systems do not perpetuate or amplify existing biases. The authors present empirical results across various datasets, including CelebA and FairFace, showing the detrimental effects of MIDS and the effectiveness of AR interventions. They conclude that while MIDS can lead to significant fairness issues, intentional interventions like AR offer a promising approach to promoting equity and justice in data ecosystems.
Reach us at info@study.space