[slides] MAIRA-2%3A Grounded Radiology Report Generation

The paper introduces MAIRA-2, a large multimodal model designed for generating chest X-ray (CXR) reports with and without grounding. The authors address the challenge of radiology reporting, which requires detailed medical image understanding and precise language generation. To enhance the utility of automated report generation, MAIRA-2 incorporates localization of individual findings on the image, a task called grounded report generation. The model is trained on a comprehensive set of inputs, including lateral and frontal images, prior reports, and relevant sections of the CXR report. The authors propose a novel evaluation framework called RadFact, which leverages the logical inference capabilities of large language models (LLMs) to evaluate the correctness and completeness of generated reports at the sentence level. RadFact provides metrics such as logical precision and recall, grounding precision and recall, and spatial precision and recall, allowing for a nuanced assessment of automated reporting. MAIRA-2 achieves state-of-the-art performance on existing report generation benchmarks and establishes the novel task of grounded report generation. The model's ability to generate grounded reports is demonstrated through extensive qualitative reviews by a thoracic radiologist, who found that the majority of generated reports required minimal corrections. The study also highlights areas where the model needs improvement, such as missed findings and internal consistency. The paper discusses the limitations of the current evaluation metrics and suggests future directions for improving the evaluation of radiology report generation, including the integration of additional imaging information and the development of more flexible error metrics. Overall, MAIRA-2 represents a significant step towards realizing the potential of automated radiology report generation in clinical practice.The paper introduces MAIRA-2, a large multimodal model designed for generating chest X-ray (CXR) reports with and without grounding. The authors address the challenge of radiology reporting, which requires detailed medical image understanding and precise language generation. To enhance the utility of automated report generation, MAIRA-2 incorporates localization of individual findings on the image, a task called grounded report generation. The model is trained on a comprehensive set of inputs, including lateral and frontal images, prior reports, and relevant sections of the CXR report. The authors propose a novel evaluation framework called RadFact, which leverages the logical inference capabilities of large language models (LLMs) to evaluate the correctness and completeness of generated reports at the sentence level. RadFact provides metrics such as logical precision and recall, grounding precision and recall, and spatial precision and recall, allowing for a nuanced assessment of automated reporting. MAIRA-2 achieves state-of-the-art performance on existing report generation benchmarks and establishes the novel task of grounded report generation. The model's ability to generate grounded reports is demonstrated through extensive qualitative reviews by a thoracic radiologist, who found that the majority of generated reports required minimal corrections. The study also highlights areas where the model needs improvement, such as missed findings and internal consistency. The paper discusses the limitations of the current evaluation metrics and suggests future directions for improving the evaluation of radiology report generation, including the integration of additional imaging information and the development of more flexible error metrics. Overall, MAIRA-2 represents a significant step towards realizing the potential of automated radiology report generation in clinical practice.

MAIRA-2: Grounded Radiology Report Generation