Understanding ROCOv2%3A Radiology Objects in COntext Version 2%2C an Updated Multimodal Image Dataset

ROCOv2 is a multimodal image dataset containing 79,789 radiological images with associated captions and medical concepts. It is an updated version of the ROCO dataset, which was introduced in 2018. ROCOv2 includes 35,705 new images added to the PMC Open Access subset since 2018. The dataset provides manually curated concepts for imaging modalities, including additional anatomical and directional concepts for X-rays. It has been used in the concept detection and caption prediction tasks of ImageCLEFmedical Caption 2023. The dataset is suitable for training image annotation models based on image-caption pairs, or for multi-label image classification using UMLS concepts. It can also be used for pre-training medical domain models and evaluating deep learning models for multi-task learning. ROCOv2 was created by downloading the PMC Open Access Subset via FTP, extracting the archives, and filtering the images through two binary classification models. The resulting dataset includes 79,789 images, with 59,958 in the training set, 9904 in the validation set, and 9927 in the test set. The dataset includes captions and concepts extracted from the PMC Open Access Subset, with captions processed to remove non-English captions, URLs, and LaTeX code. Concepts were extracted using the Medical Concept Annotation Toolkit (MedCAT) and manually curated concepts were added for modality, body region, and directionality. The dataset includes images from various anatomical regions, medical concepts, and modalities. It has been used in several studies, including the development of models for medical visual question answering and the creation of specialized vision encoders. The dataset is available on Zenodo and includes images, captions, and concepts for training, validation, and test splits, as well as image license information. The dataset has been validated for concept detection and caption prediction, with results showing that the baseline models achieve similar results on the ImageCLEF dataset as the challenge participants while performing better on the ROCOv2 dataset. The dataset is suitable for training models for concept detection and caption prediction.ROCOv2 is a multimodal image dataset containing 79,789 radiological images with associated captions and medical concepts. It is an updated version of the ROCO dataset, which was introduced in 2018. ROCOv2 includes 35,705 new images added to the PMC Open Access subset since 2018. The dataset provides manually curated concepts for imaging modalities, including additional anatomical and directional concepts for X-rays. It has been used in the concept detection and caption prediction tasks of ImageCLEFmedical Caption 2023. The dataset is suitable for training image annotation models based on image-caption pairs, or for multi-label image classification using UMLS concepts. It can also be used for pre-training medical domain models and evaluating deep learning models for multi-task learning. ROCOv2 was created by downloading the PMC Open Access Subset via FTP, extracting the archives, and filtering the images through two binary classification models. The resulting dataset includes 79,789 images, with 59,958 in the training set, 9904 in the validation set, and 9927 in the test set. The dataset includes captions and concepts extracted from the PMC Open Access Subset, with captions processed to remove non-English captions, URLs, and LaTeX code. Concepts were extracted using the Medical Concept Annotation Toolkit (MedCAT) and manually curated concepts were added for modality, body region, and directionality. The dataset includes images from various anatomical regions, medical concepts, and modalities. It has been used in several studies, including the development of models for medical visual question answering and the creation of specialized vision encoders. The dataset is available on Zenodo and includes images, captions, and concepts for training, validation, and test splits, as well as image license information. The dataset has been validated for concept detection and caption prediction, with results showing that the baseline models achieve similar results on the ImageCLEF dataset as the challenge participants while performing better on the ROCOv2 dataset. The dataset is suitable for training models for concept detection and caption prediction.

ROCOv2: Radiology Objects in CContext Version 2, an Updated Multimodal Image Dataset

18 Jun 2024 | Johannes Rücker, Louise Bloch, Raphael Brüngel, Ahmad Idrissi-Yaghir, Henning Schäfer, Cynthia S. Schmidt, Sven Koitka, Obioma Pelka, Asma Ben Abacha, Alba G. Seco de Herrera, Henning Müller, Peter A. Horn, Felix Nensa, and Christoph M. Friedrich