7 Mar 2024 | Ibrahim Alabdulmohsin, Xiao Wang, Andreas Steiner, Priya Goyal, Alexander D'Amour, Xiaohua Zhai
The paper "CLIP the Bias: How Useful is Data Balancing in Multimodal Learning?" by Ibrahim Alabdulmohsin et al. from Google DeepMind explores the effectiveness of data balancing in mitigating biases in contrastive language-image pretraining (CLIP). The authors identify that CLIP models can absorb societal stereotypes and propose a novel algorithm called Multi-Modal Moment Matching (M4) to reduce both representation and association biases in multimodal data. They conduct a comprehensive analysis, considering various factors such as model, representation, and data size. Key findings include:
1. **Proxy Variables**: Adding proxy variables helps mitigate representation biases but can hurt association biases.
2. **Fine-Tuning**: Fine-tuning on balanced data effectively reduces representation biases but is less effective for association biases.
3. **Model Quality**: Data balancing has a mixed impact on model performance, improving classification but hurting retrieval.
4. **Improvements**: Improving data quality and model architecture, such as using SigLIP and filtering out low similarity image-text pairs, can mitigate the negative impacts of data balancing.
The authors conclude with recommendations for improving the efficacy of data balancing in multimodal systems, emphasizing the need for a combination of interventions and the importance of training on balanced data from the outset.The paper "CLIP the Bias: How Useful is Data Balancing in Multimodal Learning?" by Ibrahim Alabdulmohsin et al. from Google DeepMind explores the effectiveness of data balancing in mitigating biases in contrastive language-image pretraining (CLIP). The authors identify that CLIP models can absorb societal stereotypes and propose a novel algorithm called Multi-Modal Moment Matching (M4) to reduce both representation and association biases in multimodal data. They conduct a comprehensive analysis, considering various factors such as model, representation, and data size. Key findings include:
1. **Proxy Variables**: Adding proxy variables helps mitigate representation biases but can hurt association biases.
2. **Fine-Tuning**: Fine-tuning on balanced data effectively reduces representation biases but is less effective for association biases.
3. **Model Quality**: Data balancing has a mixed impact on model performance, improving classification but hurting retrieval.
4. **Improvements**: Improving data quality and model architecture, such as using SigLIP and filtering out low similarity image-text pairs, can mitigate the negative impacts of data balancing.
The authors conclude with recommendations for improving the efficacy of data balancing in multimodal systems, emphasizing the need for a combination of interventions and the importance of training on balanced data from the outset.