The paper "Dia-LLaMA: Towards Large Language Model-driven CT Report Generation" addresses the challenges in generating medical reports, particularly for CT scans, which are more complex and less explored compared to chest X-rays. The authors propose a framework called Dia-LLaMA, which leverages large language models (LLMs) to generate CT reports with diagnostic guidance. The key contributions include:
1. **LLM Integration**: The framework uses LLaMA2-7B to generate reports by incorporating visual embeddings and diagnostic information.
2. **Visual Embedding Extraction**: A pre-trained ViT3D with perceiver is used to extract visual information from CT volumes.
3. **Diagnostic Information**: Additional diagnostic information is extracted from a disease prototype memory bank, which captures common representations of normal and abnormal samples.
4. **Disease-Aware Attention**: This module adjusts attention for different diseases, enhancing the model's ability to focus on specific abnormalities.
5. **Disease Prototype Memory Bank**: This bank captures common representations of various diseases, improving diagnostic accuracy, especially for rare abnormalities.
Experiments on the CTRG-Chest-548K dataset show that Dia-LLaMA outperforms previous methods in both clinical efficacy (CE) and natural language generation (NLG) metrics. The method achieves state-of-the-art performance in F1 scores, BLEU, METEOR, and ROUGE-L scores, demonstrating its effectiveness in generating accurate and coherent medical reports. The paper also includes an ablation study to validate the importance of each component and discusses future directions for expanding the framework to other radiology modalities.The paper "Dia-LLaMA: Towards Large Language Model-driven CT Report Generation" addresses the challenges in generating medical reports, particularly for CT scans, which are more complex and less explored compared to chest X-rays. The authors propose a framework called Dia-LLaMA, which leverages large language models (LLMs) to generate CT reports with diagnostic guidance. The key contributions include:
1. **LLM Integration**: The framework uses LLaMA2-7B to generate reports by incorporating visual embeddings and diagnostic information.
2. **Visual Embedding Extraction**: A pre-trained ViT3D with perceiver is used to extract visual information from CT volumes.
3. **Diagnostic Information**: Additional diagnostic information is extracted from a disease prototype memory bank, which captures common representations of normal and abnormal samples.
4. **Disease-Aware Attention**: This module adjusts attention for different diseases, enhancing the model's ability to focus on specific abnormalities.
5. **Disease Prototype Memory Bank**: This bank captures common representations of various diseases, improving diagnostic accuracy, especially for rare abnormalities.
Experiments on the CTRG-Chest-548K dataset show that Dia-LLaMA outperforms previous methods in both clinical efficacy (CE) and natural language generation (NLG) metrics. The method achieves state-of-the-art performance in F1 scores, BLEU, METEOR, and ROUGE-L scores, demonstrating its effectiveness in generating accurate and coherent medical reports. The paper also includes an ablation study to validate the importance of each component and discusses future directions for expanding the framework to other radiology modalities.