CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification

CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification

24 Mar 2024 | Haoran Lai, Qingsong Yao, Zihang Jiang, Rongsheng Wang, Zhiyang He, Xiaodong Tao, S. Kevin Zhou
CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification This paper proposes CARZero, a novel approach for radiology zero-shot classification that leverages cross-attention alignment to better capture the complex relationships between medical images and reports. The method introduces a similarity representation (SimR) generated through cross-attention mechanisms, which is then projected into a similarity matrix for cross-modal alignment. Additionally, CARZero incorporates a large language model (LLM)-based prompt alignment strategy to standardize diagnostic expressions during both training and inference phases, overcoming the challenges of manual prompt design. The proposed method is evaluated on five official chest radiograph diagnostic test sets, including the PadChest dataset with a long-tail distribution of 192 diseases. CARZero achieves state-of-the-art performance, with an AUC of 0.810 on PadChest and a zero-shot performance score of 0.811 on ChestXray14, surpassing the SOTA performances of methods fine-tuned on 1% of the data. The success of CARZero is attributed to its new image-text alignment strategy, which effectively addresses the complex relationship between medical images and reports. The method involves feature extraction using image and text encoders, followed by cross-attention alignment to generate SimR. This SimR is then projected into a similarity matrix using a linear layer, and the InfoNCE loss is used for optimization. Additionally, a prompt alignment strategy is employed to standardize diagnostic expressions, improving zero-shot inference performance. CARZero is evaluated on multiple datasets, including MIMIC-CXR, Open-I, PadChest, ChestXray14, and CheXpert. The results demonstrate that CARZero achieves superior performance in zero-shot classification, particularly in diagnosing rare diseases. The method also shows strong performance in zero-shot grounding tasks, with the attention map effectively reflecting the association between images and texts. The paper also includes ablation studies that validate the effectiveness of the prompt alignment and cross-attention alignment strategies. The results show that combining these strategies leads to improved performance in zero-shot classification. Additionally, the method is shown to be effective in capturing complex image-text relationships, with the attention mechanism playing a key role in aligning images and texts. In conclusion, CARZero achieves state-of-the-art performance in zero-shot classification for radiology, demonstrating its effectiveness in diagnosing rare diseases and improving zero-shot inference performance. The method's use of cross-attention alignment and prompt alignment strategies makes it a promising approach for future research in medical image analysis.CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification This paper proposes CARZero, a novel approach for radiology zero-shot classification that leverages cross-attention alignment to better capture the complex relationships between medical images and reports. The method introduces a similarity representation (SimR) generated through cross-attention mechanisms, which is then projected into a similarity matrix for cross-modal alignment. Additionally, CARZero incorporates a large language model (LLM)-based prompt alignment strategy to standardize diagnostic expressions during both training and inference phases, overcoming the challenges of manual prompt design. The proposed method is evaluated on five official chest radiograph diagnostic test sets, including the PadChest dataset with a long-tail distribution of 192 diseases. CARZero achieves state-of-the-art performance, with an AUC of 0.810 on PadChest and a zero-shot performance score of 0.811 on ChestXray14, surpassing the SOTA performances of methods fine-tuned on 1% of the data. The success of CARZero is attributed to its new image-text alignment strategy, which effectively addresses the complex relationship between medical images and reports. The method involves feature extraction using image and text encoders, followed by cross-attention alignment to generate SimR. This SimR is then projected into a similarity matrix using a linear layer, and the InfoNCE loss is used for optimization. Additionally, a prompt alignment strategy is employed to standardize diagnostic expressions, improving zero-shot inference performance. CARZero is evaluated on multiple datasets, including MIMIC-CXR, Open-I, PadChest, ChestXray14, and CheXpert. The results demonstrate that CARZero achieves superior performance in zero-shot classification, particularly in diagnosing rare diseases. The method also shows strong performance in zero-shot grounding tasks, with the attention map effectively reflecting the association between images and texts. The paper also includes ablation studies that validate the effectiveness of the prompt alignment and cross-attention alignment strategies. The results show that combining these strategies leads to improved performance in zero-shot classification. Additionally, the method is shown to be effective in capturing complex image-text relationships, with the attention mechanism playing a key role in aligning images and texts. In conclusion, CARZero achieves state-of-the-art performance in zero-shot classification for radiology, demonstrating its effectiveness in diagnosing rare diseases and improving zero-shot inference performance. The method's use of cross-attention alignment and prompt alignment strategies makes it a promising approach for future research in medical image analysis.
Reach us at info@study.space