May 2024 | Sepehr Dehdashtian, Lan Wang, Vishnu Naresh Boddeti
FairerCLIP is a method designed to reduce bias in zero-shot predictions made by CLIP, a large pre-trained vision-language model. The model addresses two main issues: spurious correlations and intrinsic dependencies in the data. By operating in reproducing kernel Hilbert spaces (RKHSs), FairerCLIP offers flexibility, efficiency, and performance improvements over existing methods. It uses a non-parametric measure of statistical dependence to ensure that the representations of images and text are independent of sensitive attributes while maintaining their ability to predict target attributes accurately. The method employs an alternating optimization approach with closed-form solvers, leading to faster training and better performance on benchmark datasets. FairerCLIP is effective in both scenarios where ground-truth labels are available and when they are not, and it outperforms existing baselines in terms of fairness and accuracy. The method is evaluated on various datasets, including those with spurious correlations and intrinsic dependencies, demonstrating its effectiveness in mitigating bias. The results show that FairerCLIP significantly improves fairness metrics and reduces the gap between the performance of different groups. The method is also computationally efficient, making it suitable for large-scale applications. Overall, FairerCLIP provides a robust solution for debiasing CLIP's zero-shot predictions, enhancing both fairness and accuracy.FairerCLIP is a method designed to reduce bias in zero-shot predictions made by CLIP, a large pre-trained vision-language model. The model addresses two main issues: spurious correlations and intrinsic dependencies in the data. By operating in reproducing kernel Hilbert spaces (RKHSs), FairerCLIP offers flexibility, efficiency, and performance improvements over existing methods. It uses a non-parametric measure of statistical dependence to ensure that the representations of images and text are independent of sensitive attributes while maintaining their ability to predict target attributes accurately. The method employs an alternating optimization approach with closed-form solvers, leading to faster training and better performance on benchmark datasets. FairerCLIP is effective in both scenarios where ground-truth labels are available and when they are not, and it outperforms existing baselines in terms of fairness and accuracy. The method is evaluated on various datasets, including those with spurious correlations and intrinsic dependencies, demonstrating its effectiveness in mitigating bias. The results show that FairerCLIP significantly improves fairness metrics and reduces the gap between the performance of different groups. The method is also computationally efficient, making it suitable for large-scale applications. Overall, FairerCLIP provides a robust solution for debiasing CLIP's zero-shot predictions, enhancing both fairness and accuracy.