The limits of fair medical imaging AI in real-world generalization

The limits of fair medical imaging AI in real-world generalization

28 June 2024 | Yuzhe Yang, Haoran Zhang, Judy W. Gichoya, Dina Katabi, Marzyeh Ghassemi
The study investigates the impact of demographic shortcuts in medical imaging AI models and their implications for fairness across subpopulations. Using six global chest X-ray datasets, the authors examine how AI models encode demographic attributes and the resulting fairness gaps. They find that models use demographic shortcuts, leading to biased predictions, particularly in radiology, dermatology, and ophthalmology. While algorithmic methods can mitigate these shortcuts to create 'locally optimal' models within the training data, these models often fail to perform well in new, out-of-distribution (OOD) settings. Surprisingly, models with less demographic encoding are more 'globally optimal,' showing better fairness in new test environments. The study emphasizes the need for best practices in medical imaging models to maintain performance and fairness beyond their initial training contexts, highlighting critical considerations for AI clinical deployments across diverse populations and sites.The study investigates the impact of demographic shortcuts in medical imaging AI models and their implications for fairness across subpopulations. Using six global chest X-ray datasets, the authors examine how AI models encode demographic attributes and the resulting fairness gaps. They find that models use demographic shortcuts, leading to biased predictions, particularly in radiology, dermatology, and ophthalmology. While algorithmic methods can mitigate these shortcuts to create 'locally optimal' models within the training data, these models often fail to perform well in new, out-of-distribution (OOD) settings. Surprisingly, models with less demographic encoding are more 'globally optimal,' showing better fairness in new test environments. The study emphasizes the need for best practices in medical imaging models to maintain performance and fairness beyond their initial training contexts, highlighting critical considerations for AI clinical deployments across diverse populations and sites.
Reach us at info@study.space