The BIMCV-R dataset is a comprehensive collection of 8,069 3D CT volumes, containing over 2 million slices, paired with their respective radiological reports. This dataset was created to address the lack of robust evaluation benchmarks and curated datasets in 3D medical text-image retrieval. The dataset is built upon the BIMCV dataset and includes anonymized radiological reports translated into English using GPT-4, ensuring data privacy and accuracy. The dataset also includes a detailed keyword library derived from expert diagnoses of 96 different diseases.
The study introduces MedFinder, a dual-stream network architecture that leverages large language models to enhance 3D medical image retrieval. MedFinder employs a 3D image encoder and a pre-trained text encoder to extract features from medical images and text, respectively. The model uses view consistency and cross-attention mechanisms to improve feature representation and similarity matching between text and images.
The dataset and model were evaluated using metrics such as Recall@K, Median Rank, and Mean Rank for multimodal retrieval, and Precision@K for keyword-based retrieval. The results show that MedFinder outperforms existing methods in both tasks, achieving a high accuracy in retrieving relevant cases. The study highlights the potential of the BIMCV-R dataset in advancing 3D medical image analysis and retrieval technologies. The dataset and model are available for public use, providing a valuable resource for researchers in the field.The BIMCV-R dataset is a comprehensive collection of 8,069 3D CT volumes, containing over 2 million slices, paired with their respective radiological reports. This dataset was created to address the lack of robust evaluation benchmarks and curated datasets in 3D medical text-image retrieval. The dataset is built upon the BIMCV dataset and includes anonymized radiological reports translated into English using GPT-4, ensuring data privacy and accuracy. The dataset also includes a detailed keyword library derived from expert diagnoses of 96 different diseases.
The study introduces MedFinder, a dual-stream network architecture that leverages large language models to enhance 3D medical image retrieval. MedFinder employs a 3D image encoder and a pre-trained text encoder to extract features from medical images and text, respectively. The model uses view consistency and cross-attention mechanisms to improve feature representation and similarity matching between text and images.
The dataset and model were evaluated using metrics such as Recall@K, Median Rank, and Mean Rank for multimodal retrieval, and Precision@K for keyword-based retrieval. The results show that MedFinder outperforms existing methods in both tasks, achieving a high accuracy in retrieving relevant cases. The study highlights the potential of the BIMCV-R dataset in advancing 3D medical image analysis and retrieval technologies. The dataset and model are available for public use, providing a valuable resource for researchers in the field.