19 Jul 2024 | Paul Friedrich, Julia Wolleb, Florentin Bieder, Alicia Durrer, and Philippe C. Cattin
This paper introduces WDM (Wavelet-based Diffusion Model), a novel framework for generating high-resolution medical images using wavelet decomposition and diffusion models. The approach addresses the challenges of 3D medical image synthesis, particularly the high GPU memory requirements and the potential for artifacts in existing methods. WDM trains a diffusion model on wavelet coefficients of the input images, reducing the spatial dimension and memory footprint. This method is trained on a single 40GB GPU and demonstrates state-of-the-art performance in image fidelity (FID) and sample diversity (MS-SSIM) scores at resolutions up to 128×128×128, with the ability to generate images at 256×256×256, outperforming other methods. The paper evaluates WDM on the BraTS and LIDC-IDRI datasets, showing superior results compared to GANs, Diffusion Models, and Latent Diffusion Models. The method's effectiveness is attributed to its memory efficiency and the reduction in spatial dimension, which also leads to shorter inference times. Future work includes extending the framework for conditional image generation, image inpainting, and image-to-image translation.This paper introduces WDM (Wavelet-based Diffusion Model), a novel framework for generating high-resolution medical images using wavelet decomposition and diffusion models. The approach addresses the challenges of 3D medical image synthesis, particularly the high GPU memory requirements and the potential for artifacts in existing methods. WDM trains a diffusion model on wavelet coefficients of the input images, reducing the spatial dimension and memory footprint. This method is trained on a single 40GB GPU and demonstrates state-of-the-art performance in image fidelity (FID) and sample diversity (MS-SSIM) scores at resolutions up to 128×128×128, with the ability to generate images at 256×256×256, outperforming other methods. The paper evaluates WDM on the BraTS and LIDC-IDRI datasets, showing superior results compared to GANs, Diffusion Models, and Latent Diffusion Models. The method's effectiveness is attributed to its memory efficiency and the reduction in spatial dimension, which also leads to shorter inference times. Future work includes extending the framework for conditional image generation, image inpainting, and image-to-image translation.