The Dark Side of Dataset Scaling: Evaluating Racial Classification in Multimodal Models

The Dark Side of Dataset Scaling: Evaluating Racial Classification in Multimodal Models

June 3-6, 2024 | Abeba Birhane, Sepehr Dehdashtian, Vinay Uday Prabhu, and Vishnu Boddeti
The paper investigates the impact of dataset scaling on racial classification in multimodal models, focusing on the LAION400-M and LAION-2B datasets. It evaluates 14 Vision Transformer-based VLMs using the Chicago Face Dataset (CFD) to measure racial and gender bias. The study finds that as training data increases, the probability of a pre-trained CLIP model misclassifying human images as non-human classes like chimpanzee, gorilla, and orangutan decreases, but the probability of misclassifying images as human offensive classes like criminal increases. For larger ViT-L models, the probability of predicting Black and Latino men as criminal increases by 65% and 69%, respectively, when the dataset is scaled from 400M to 2B samples. Conversely, for smaller ViT-B models, the probability decreases by 20% and 47%. The study highlights the risks of dataset scaling in exacerbating racial bias and dehumanization, particularly against Black individuals. It emphasizes the need for rigorous dataset curation, audit, and management to mitigate these biases. The findings underscore the importance of transparency, accountability, and ethical considerations in AI development. The paper calls for open access to datasets and models to enable independent audits and regulatory frameworks. It also discusses the limitations of the CFD dataset, including its binary gender and racial categories, which may not fully represent the diversity of real-world identities. The study concludes that dataset scaling can significantly influence model behavior, and that addressing these biases is crucial for fair and equitable AI systems.The paper investigates the impact of dataset scaling on racial classification in multimodal models, focusing on the LAION400-M and LAION-2B datasets. It evaluates 14 Vision Transformer-based VLMs using the Chicago Face Dataset (CFD) to measure racial and gender bias. The study finds that as training data increases, the probability of a pre-trained CLIP model misclassifying human images as non-human classes like chimpanzee, gorilla, and orangutan decreases, but the probability of misclassifying images as human offensive classes like criminal increases. For larger ViT-L models, the probability of predicting Black and Latino men as criminal increases by 65% and 69%, respectively, when the dataset is scaled from 400M to 2B samples. Conversely, for smaller ViT-B models, the probability decreases by 20% and 47%. The study highlights the risks of dataset scaling in exacerbating racial bias and dehumanization, particularly against Black individuals. It emphasizes the need for rigorous dataset curation, audit, and management to mitigate these biases. The findings underscore the importance of transparency, accountability, and ethical considerations in AI development. The paper calls for open access to datasets and models to enable independent audits and regulatory frameworks. It also discusses the limitations of the CFD dataset, including its binary gender and racial categories, which may not fully represent the diversity of real-world identities. The study concludes that dataset scaling can significantly influence model behavior, and that addressing these biases is crucial for fair and equitable AI systems.
Reach us at info@study.space