4 Jul 2024 | Ibrahim Ethem Hamamci, Sezgin Er, and Bjoern Menze
CT2Rep is the first method for generating radiology reports for 3D medical imaging, specifically targeting chest CT volumes. The method uses a novel 3D auto-regressive causal transformer and integrates relational memory to utilize information from previous reports. It also incorporates a cross-attention-based multi-modal fusion module and hierarchical memory to leverage longitudinal data, enhancing the accuracy and context of generated reports. The model is trained on the CT-RATE dataset, which includes 25,692 non-contrast chest CT volumes and corresponding reports. CT2Rep outperforms a baseline method using a state-of-the-art 3D vision encoder, demonstrating the effectiveness of its novel approach. Additionally, CT2RepLong extends the model to incorporate longitudinal data, further improving report generation by integrating historical multimodal data. The model's performance is evaluated using natural language generation and clinical efficacy metrics, showing significant improvements over the baseline. CT2Rep and CT2RepLong are made publicly available for further research and application in 3D medical imaging.CT2Rep is the first method for generating radiology reports for 3D medical imaging, specifically targeting chest CT volumes. The method uses a novel 3D auto-regressive causal transformer and integrates relational memory to utilize information from previous reports. It also incorporates a cross-attention-based multi-modal fusion module and hierarchical memory to leverage longitudinal data, enhancing the accuracy and context of generated reports. The model is trained on the CT-RATE dataset, which includes 25,692 non-contrast chest CT volumes and corresponding reports. CT2Rep outperforms a baseline method using a state-of-the-art 3D vision encoder, demonstrating the effectiveness of its novel approach. Additionally, CT2RepLong extends the model to incorporate longitudinal data, further improving report generation by integrating historical multimodal data. The model's performance is evaluated using natural language generation and clinical efficacy metrics, showing significant improvements over the baseline. CT2Rep and CT2RepLong are made publicly available for further research and application in 3D medical imaging.