[slides and audio] BIMCV-R%3A A Landmark Dataset for 3D CT Text-Image Retrieval

The BIMCV-R dataset is a comprehensive collection of 8,069 3D CT volumes, containing over 2 million slices, paired with their respective radiological reports. This dataset was created to address the lack of robust evaluation benchmarks and curated datasets in 3D medical text-image retrieval. The dataset is built upon the BIMCV dataset and includes anonymized radiological reports translated into English using GPT-4, ensuring data privacy and accuracy. The dataset also includes a detailed keyword library derived from expert diagnoses of 96 different diseases. The study introduces MedFinder, a dual-stream network architecture that leverages large language models to enhance 3D medical image retrieval. MedFinder employs a 3D image encoder and a pre-trained text encoder to extract features from medical images and text, respectively. The model uses view consistency and cross-attention mechanisms to improve feature representation and similarity matching between text and images. The dataset and model were evaluated using metrics such as Recall@K, Median Rank, and Mean Rank for multimodal retrieval, and Precision@K for keyword-based retrieval. The results show that MedFinder outperforms existing methods in both tasks, achieving a high accuracy in retrieving relevant cases. The study highlights the potential of the BIMCV-R dataset in advancing 3D medical image analysis and retrieval technologies. The dataset and model are available for public use, providing a valuable resource for researchers in the field.The BIMCV-R dataset is a comprehensive collection of 8,069 3D CT volumes, containing over 2 million slices, paired with their respective radiological reports. This dataset was created to address the lack of robust evaluation benchmarks and curated datasets in 3D medical text-image retrieval. The dataset is built upon the BIMCV dataset and includes anonymized radiological reports translated into English using GPT-4, ensuring data privacy and accuracy. The dataset also includes a detailed keyword library derived from expert diagnoses of 96 different diseases. The study introduces MedFinder, a dual-stream network architecture that leverages large language models to enhance 3D medical image retrieval. MedFinder employs a 3D image encoder and a pre-trained text encoder to extract features from medical images and text, respectively. The model uses view consistency and cross-attention mechanisms to improve feature representation and similarity matching between text and images. The dataset and model were evaluated using metrics such as Recall@K, Median Rank, and Mean Rank for multimodal retrieval, and Precision@K for keyword-based retrieval. The results show that MedFinder outperforms existing methods in both tasks, achieving a high accuracy in retrieving relevant cases. The study highlights the potential of the BIMCV-R dataset in advancing 3D medical image analysis and retrieval technologies. The dataset and model are available for public use, providing a valuable resource for researchers in the field.

BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval

18 Jul 2024 | Yinda Chen, Che Liu, Xiaoyu Liu, Rossella Arcucci, Zhiwei Xiong