MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant

MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant

7 Mar 2024 | Chenlu Zhan, Yu Lin, Gaoang Wang, Hongwei Wang, Jian Wu
MedM2G is a unified medical multi-modal generative model that aligns, extracts, and generates multiple medical modalities (CT, MRI, X-ray) within a single framework. The model addresses the challenges of limited paired medical data, diverse modalities, and the need for efficient cross-modal generation. It introduces a central alignment strategy to efficiently align multiple modalities in a unified space, preserving medical visual invariants to maintain clinical knowledge. The model also employs a latent cross-guided diffusion process with adaptive parameters to enhance cross-modal interactions. MedM2G achieves state-of-the-art results across five medical generation tasks on ten datasets, outperforming existing methods in text-to-image, image-to-text, and multi-modal generation. The model's key innovations include the central alignment strategy, medical visual invariant preservation, and latent cross-guided diffusion. It is the first medical generative model that unifies medical generation tasks and enables efficient generation of multiple modalities without requiring paired data. The model is trained using a multi-flow strategy, allowing it to handle multiple medical generation tasks with three paired datasets. MedM2G demonstrates superior performance in medical image generation, text-to-image generation, and MRI synthesis, achieving high-quality results with minimal computational cost. The model's effectiveness is validated through extensive experiments and comparisons with state-of-the-art methods.MedM2G is a unified medical multi-modal generative model that aligns, extracts, and generates multiple medical modalities (CT, MRI, X-ray) within a single framework. The model addresses the challenges of limited paired medical data, diverse modalities, and the need for efficient cross-modal generation. It introduces a central alignment strategy to efficiently align multiple modalities in a unified space, preserving medical visual invariants to maintain clinical knowledge. The model also employs a latent cross-guided diffusion process with adaptive parameters to enhance cross-modal interactions. MedM2G achieves state-of-the-art results across five medical generation tasks on ten datasets, outperforming existing methods in text-to-image, image-to-text, and multi-modal generation. The model's key innovations include the central alignment strategy, medical visual invariant preservation, and latent cross-guided diffusion. It is the first medical generative model that unifies medical generation tasks and enables efficient generation of multiple modalities without requiring paired data. The model is trained using a multi-flow strategy, allowing it to handle multiple medical generation tasks with three paired datasets. MedM2G demonstrates superior performance in medical image generation, text-to-image generation, and MRI synthesis, achieving high-quality results with minimal computational cost. The model's effectiveness is validated through extensive experiments and comparisons with state-of-the-art methods.
Reach us at info@study.space
[slides and audio] MedM2G%3A Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant