CatVTON is a lightweight and efficient virtual try-on diffusion model that achieves high-quality results by simply concatenating garment and person images in the spatial dimension. It eliminates the need for ReferenceNet and additional image encoders, reducing model parameters and memory usage by over 40% compared to other diffusion-based methods. The model uses a single UNet backbone and removes unnecessary text encoders and cross-attention mechanisms, resulting in a simplified and efficient architecture. CatVTON achieves high-quality virtual try-on results with minimal trainable parameters (49.57M) and efficient training. It also simplifies the inference process by eliminating pre-processing steps and text conditions, requiring only garment reference, target person image, and mask. Extensive experiments on the VITON-HD and DressCode datasets show that CatVTON outperforms state-of-the-art methods in both qualitative and quantitative analyses, demonstrating superior performance in complex scenarios and in-the-wild settings. The model's lightweight design and efficient training strategy make it a promising solution for practical applications in virtual try-on technology.CatVTON is a lightweight and efficient virtual try-on diffusion model that achieves high-quality results by simply concatenating garment and person images in the spatial dimension. It eliminates the need for ReferenceNet and additional image encoders, reducing model parameters and memory usage by over 40% compared to other diffusion-based methods. The model uses a single UNet backbone and removes unnecessary text encoders and cross-attention mechanisms, resulting in a simplified and efficient architecture. CatVTON achieves high-quality virtual try-on results with minimal trainable parameters (49.57M) and efficient training. It also simplifies the inference process by eliminating pre-processing steps and text conditions, requiring only garment reference, target person image, and mask. Extensive experiments on the VITON-HD and DressCode datasets show that CatVTON outperforms state-of-the-art methods in both qualitative and quantitative analyses, demonstrating superior performance in complex scenarios and in-the-wild settings. The model's lightweight design and efficient training strategy make it a promising solution for practical applications in virtual try-on technology.