2024 | Fan Bai, Yuxin Du, Tiejun Huang, Max Q.-H. Meng, Bo Zhao
This paper introduces M3D, a framework for advancing 3D medical image analysis using multi-modal large language models (MLLMs). The authors present M3D-Data, a large-scale 3D medical dataset containing 120,000 image-text pairs and 662,000 instruction-response pairs, designed for various 3D medical tasks. They also propose M3D-LaMed, a versatile MLLM for 3D medical image analysis, capable of performing tasks such as image-text retrieval, report generation, visual question answering, positioning, and segmentation. Additionally, they introduce M3D-Bench, a comprehensive 3D multi-modal benchmark covering eight tasks, which enables automatic evaluation of 3D medical image analysis models.
The authors address the challenge of analyzing 3D medical images, which are more complex than 2D images due to their richer spatial information. Existing MLLMs have primarily been applied to 2D medical images, but the authors extend their use to 3D images. To achieve this, they develop a 3D vision encoder and a 3D spatial pooling perceiver to process 3D images efficiently. They also introduce a promptable segmentation module to enable referring expression segmentation of 3D medical images.
The authors evaluate their approach on various tasks, including image-text retrieval, report generation, VQA, positioning, and segmentation. Their results show that M3D-LaMed outperforms existing solutions in these tasks. The model is evaluated using both traditional and LLM-based metrics, demonstrating its effectiveness in 3D medical image analysis. The authors also conduct ablation studies and case studies to validate the model's performance and generalization capabilities.
The study contributes to the field of medical image analysis by providing a large-scale 3D medical dataset, a versatile MLLM for 3D medical image analysis, and a comprehensive benchmark for evaluating 3D medical image analysis models. The availability of the dataset, code, and models at https://github.com/BAAIDCAI/M3D enables further research and application in 3D medical image analysis.This paper introduces M3D, a framework for advancing 3D medical image analysis using multi-modal large language models (MLLMs). The authors present M3D-Data, a large-scale 3D medical dataset containing 120,000 image-text pairs and 662,000 instruction-response pairs, designed for various 3D medical tasks. They also propose M3D-LaMed, a versatile MLLM for 3D medical image analysis, capable of performing tasks such as image-text retrieval, report generation, visual question answering, positioning, and segmentation. Additionally, they introduce M3D-Bench, a comprehensive 3D multi-modal benchmark covering eight tasks, which enables automatic evaluation of 3D medical image analysis models.
The authors address the challenge of analyzing 3D medical images, which are more complex than 2D images due to their richer spatial information. Existing MLLMs have primarily been applied to 2D medical images, but the authors extend their use to 3D images. To achieve this, they develop a 3D vision encoder and a 3D spatial pooling perceiver to process 3D images efficiently. They also introduce a promptable segmentation module to enable referring expression segmentation of 3D medical images.
The authors evaluate their approach on various tasks, including image-text retrieval, report generation, VQA, positioning, and segmentation. Their results show that M3D-LaMed outperforms existing solutions in these tasks. The model is evaluated using both traditional and LLM-based metrics, demonstrating its effectiveness in 3D medical image analysis. The authors also conduct ablation studies and case studies to validate the model's performance and generalization capabilities.
The study contributes to the field of medical image analysis by providing a large-scale 3D medical dataset, a versatile MLLM for 3D medical image analysis, and a comprehensive benchmark for evaluating 3D medical image analysis models. The availability of the dataset, code, and models at https://github.com/BAAIDCAI/M3D enables further research and application in 3D medical image analysis.