Dia-LLaMA: Towards Large Language Model-driven CT Report Generation

Dia-LLaMA: Towards Large Language Model-driven CT Report Generation

25 Mar 2024 | Zhixuan Chen, Luyang Luo, Yequan Bie, Hao Chen
Dia-LLaMA: Towards Large Language Model-driven CT Report Generation This paper proposes Dia-LLaMA, a framework for CT report generation using large language models (LLMs). The challenge of CT report generation includes data imbalance, limited CT-report pairs, and the need to emphasize abnormal information. Dia-LLaMA integrates diagnostic information as guidance prompts to enhance report generation. A pre-trained ViT3D with perceiver is used to extract visual information from CT images. Additional diagnostic information is obtained by referring to a disease prototype memory bank, which is updated during training to capture common disease representations. Disease-aware attention is introduced to enable the model to adjust attention for different diseases. Experiments on the chest CT dataset show that Dia-LLaMA outperforms previous methods in both clinical efficacy and natural language generation metrics. The framework combines visual embeddings and diagnostic information to guide LLM for report generation. Disease-aware attention is used to extract disease-level features from CT volumes. A disease prototype memory bank is introduced to capture common representations of normal and abnormal samples. The diagnostic results are compared with disease prototypes to generate diagnostic text prompts for LLM. The method is evaluated on the CTRG-Chest-548K dataset. Results show that Dia-LLaMA achieves state-of-the-art performance in both clinical efficacy and natural language generation metrics. The method outperforms other approaches in most metrics, including BLEU, METEOR, and ROUGE-L. The ablation study shows that incorporating diagnostic information and disease-aware attention significantly improves performance. The method also addresses the challenge of data imbalance by using a disease prototype memory bank. The framework is trained using a combination of disease-prototype loss and language modeling loss. The model is trained on two RTX 3090 GPUs for about 16 hours. The results demonstrate that Dia-LLaMA achieves superior performance compared to other methods in CT report generation. The method is effective in capturing critical abnormal information and achieving high diagnostic accuracy. Future work will explore the potential of LLMs for generating reports based on all radiology modalities.Dia-LLaMA: Towards Large Language Model-driven CT Report Generation This paper proposes Dia-LLaMA, a framework for CT report generation using large language models (LLMs). The challenge of CT report generation includes data imbalance, limited CT-report pairs, and the need to emphasize abnormal information. Dia-LLaMA integrates diagnostic information as guidance prompts to enhance report generation. A pre-trained ViT3D with perceiver is used to extract visual information from CT images. Additional diagnostic information is obtained by referring to a disease prototype memory bank, which is updated during training to capture common disease representations. Disease-aware attention is introduced to enable the model to adjust attention for different diseases. Experiments on the chest CT dataset show that Dia-LLaMA outperforms previous methods in both clinical efficacy and natural language generation metrics. The framework combines visual embeddings and diagnostic information to guide LLM for report generation. Disease-aware attention is used to extract disease-level features from CT volumes. A disease prototype memory bank is introduced to capture common representations of normal and abnormal samples. The diagnostic results are compared with disease prototypes to generate diagnostic text prompts for LLM. The method is evaluated on the CTRG-Chest-548K dataset. Results show that Dia-LLaMA achieves state-of-the-art performance in both clinical efficacy and natural language generation metrics. The method outperforms other approaches in most metrics, including BLEU, METEOR, and ROUGE-L. The ablation study shows that incorporating diagnostic information and disease-aware attention significantly improves performance. The method also addresses the challenge of data imbalance by using a disease prototype memory bank. The framework is trained using a combination of disease-prototype loss and language modeling loss. The model is trained on two RTX 3090 GPUs for about 16 hours. The results demonstrate that Dia-LLaMA achieves superior performance compared to other methods in CT report generation. The method is effective in capturing critical abnormal information and achieving high diagnostic accuracy. Future work will explore the potential of LLMs for generating reports based on all radiology modalities.
Reach us at info@study.space
[slides] Dia-LLaMA%3A Towards Large Language Model-driven CT Report Generation | StudySpace