FFA-GPT is an innovative automated pipeline designed to interpret fundus fluorescein angiography (FFA) images and generate medical reports, as well as provide interactive question-answering (QA) capabilities. The system combines a large language model (LLM) with a multimodal transformer to achieve these tasks. The pipeline consists of two main components: an image-text alignment module (Bootstrapping Language-Image Pre-training) for report generation and an LLM (Llama 2) for interactive QA. The model was trained on a dataset of 654,343 FFA images and 9392 reports, and evaluated both automatically using language-based and classification-based metrics, and manually by three experienced ophthalmologists.
The automatic evaluation demonstrated that the system can generate coherent and comprehensible reports, achieving a BERTScore of 0.70 and F1 scores ranging from 0.64 to 0.82 for detecting top-5 retinal conditions. Manual evaluation revealed acceptable accuracy (68.3%, Kappa 0.746) and completeness (62.3%, Kappa 0.739) of the generated reports. The generated answers were also evaluated manually, with the majority meeting the ophthalmologists' criteria (error-free: 70.7%, complete: 84.0%, harmless: 93.7%, satisfied: 65.3%, Kappa: 0.762–0.834).
This study introduces an innovative framework that enhances ophthalmic image interpretation and facilitates interactive communications during medical consultations. The FFA-GPT system shows promising potential to reduce the reliance on retinal specialists and improve the efficiency and accuracy of FFA image interpretation. However, future research should focus on enhancing the quality and accuracy of the generated reports and addressing ethical considerations in clinical applications.FFA-GPT is an innovative automated pipeline designed to interpret fundus fluorescein angiography (FFA) images and generate medical reports, as well as provide interactive question-answering (QA) capabilities. The system combines a large language model (LLM) with a multimodal transformer to achieve these tasks. The pipeline consists of two main components: an image-text alignment module (Bootstrapping Language-Image Pre-training) for report generation and an LLM (Llama 2) for interactive QA. The model was trained on a dataset of 654,343 FFA images and 9392 reports, and evaluated both automatically using language-based and classification-based metrics, and manually by three experienced ophthalmologists.
The automatic evaluation demonstrated that the system can generate coherent and comprehensible reports, achieving a BERTScore of 0.70 and F1 scores ranging from 0.64 to 0.82 for detecting top-5 retinal conditions. Manual evaluation revealed acceptable accuracy (68.3%, Kappa 0.746) and completeness (62.3%, Kappa 0.739) of the generated reports. The generated answers were also evaluated manually, with the majority meeting the ophthalmologists' criteria (error-free: 70.7%, complete: 84.0%, harmless: 93.7%, satisfied: 65.3%, Kappa: 0.762–0.834).
This study introduces an innovative framework that enhances ophthalmic image interpretation and facilitates interactive communications during medical consultations. The FFA-GPT system shows promising potential to reduce the reliance on retinal specialists and improve the efficiency and accuracy of FFA image interpretation. However, future research should focus on enhancing the quality and accuracy of the generated reports and addressing ethical considerations in clinical applications.