FairCLIP: Harnessing Fairness in Vision-Language Learning

FairCLIP: Harnessing Fairness in Vision-Language Learning

5 Apr 2024 | Yan Luo, Min Shi, Muhammad Osama Khan, Muhammad Muneeb Afzal, Hao Huang, Shuaihang Yuan, Yu Tian, Luo Song, Ava Kouhana, Tobias Elze, Yi Fang, Mengyu Wang
FairCLIP: Harnessing Fairness in Vision-Language Learning Fairness is a critical concern in deep learning, especially in healthcare, where models influence diagnoses and treatment decisions. Although fairness has been studied in vision-only domains, the fairness of medical vision-language (VL) models remains unexplored due to the scarcity of medical VL datasets. To address this, the authors introduce the first fair vision-language medical dataset, Harvard-FairVLMed, which provides detailed demographic attributes, ground-truth labels, and clinical notes to facilitate fairness analysis in VL foundation models. Using this dataset, they conduct a comprehensive fairness analysis of two widely-used VL models, CLIP and BLIP2, pretrained on both natural and medical domains, across four protected attributes. Their results highlight significant biases in all VL models, with Asian, Male, Non-Hispanic, and Spanish being the preferred subgroups across race, gender, ethnicity, and language, respectively. To alleviate these biases, the authors propose FairCLIP, an optimal transport-based approach that achieves a favorable trade-off between performance and fairness by reducing the Sinkhorn distance between the overall sample distribution and the distributions corresponding to each demographic group. Harvard-FairVLMed is the first VL dataset of its kind and has the potential to catalyze advancements in the development of ethically aware and clinically effective machine learning models. The dataset and code are available at https://ophai.hms.harvard.edu/datasets/harvard-fairvlmed10k. The authors also conduct extensive experiments and analyses, comparing the performance of CLIP and FairCLIP across different architectures and protected attributes. Their results show that FairCLIP significantly improves fairness metrics (DPD, DEOdds) and ES-AUC scores across various demographic subgroups. Additionally, they perform ablation studies to evaluate the impact of clinical note summarization, vision vs. multimodal features, and natural vs. medical vision encoders on model fairness. The results indicate that medical pre-training enhances the performance-fairness trade-off across all attributes except language, and that FairCLIP consistently outperforms CLIP in terms of fairness and performance. The study highlights the importance of fairness in medical VL models and provides a framework for future research in this area.FairCLIP: Harnessing Fairness in Vision-Language Learning Fairness is a critical concern in deep learning, especially in healthcare, where models influence diagnoses and treatment decisions. Although fairness has been studied in vision-only domains, the fairness of medical vision-language (VL) models remains unexplored due to the scarcity of medical VL datasets. To address this, the authors introduce the first fair vision-language medical dataset, Harvard-FairVLMed, which provides detailed demographic attributes, ground-truth labels, and clinical notes to facilitate fairness analysis in VL foundation models. Using this dataset, they conduct a comprehensive fairness analysis of two widely-used VL models, CLIP and BLIP2, pretrained on both natural and medical domains, across four protected attributes. Their results highlight significant biases in all VL models, with Asian, Male, Non-Hispanic, and Spanish being the preferred subgroups across race, gender, ethnicity, and language, respectively. To alleviate these biases, the authors propose FairCLIP, an optimal transport-based approach that achieves a favorable trade-off between performance and fairness by reducing the Sinkhorn distance between the overall sample distribution and the distributions corresponding to each demographic group. Harvard-FairVLMed is the first VL dataset of its kind and has the potential to catalyze advancements in the development of ethically aware and clinically effective machine learning models. The dataset and code are available at https://ophai.hms.harvard.edu/datasets/harvard-fairvlmed10k. The authors also conduct extensive experiments and analyses, comparing the performance of CLIP and FairCLIP across different architectures and protected attributes. Their results show that FairCLIP significantly improves fairness metrics (DPD, DEOdds) and ES-AUC scores across various demographic subgroups. Additionally, they perform ablation studies to evaluate the impact of clinical note summarization, vision vs. multimodal features, and natural vs. medical vision encoders on model fairness. The results indicate that medical pre-training enhances the performance-fairness trade-off across all attributes except language, and that FairCLIP consistently outperforms CLIP in terms of fairness and performance. The study highlights the importance of fairness in medical VL models and provides a framework for future research in this area.
Reach us at info@study.space