Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model

Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model

28 Mar 2024 | Zhicai Wang1*, Longhui Wei2†, Tan Wang3, Heyu Chen1, Yanbin Hao1, Xiang Wang1†, Xiangnan He1, Qi Tian2
This article introduces Diff-Mix, a novel inter-class data augmentation method that leverages diffusion models to enhance domain-specific image classification. The authors address the limitations of existing generative and conventional data augmentation techniques, which often fail to produce images that are both faithful in terms of foreground objects and diverse in terms of background contexts. Diff-Mix improves upon these methods by performing inter-class image translation, which enriches the dataset with a greater diversity of samples. The method involves two key operations: personalized fine-tuning and inter-class image translation. Personalized fine-tuning is used to tailor the diffusion model to generate images with faithful foreground concepts, while inter-class image translation involves transforming a reference image into an edited version that incorporates prompts from different classes. This translation strategy retains the original background context while editing the foreground to align with the target concept. The authors demonstrate that Diff-Mix achieves a better balance between faithfulness and diversity, leading to a marked improvement in performance across diverse image classification scenarios, including few-shot, conventional, and long-tail classifications for domain-specific datasets. The method is compared with other distillation-based and intra-class augmentation methods, as well as non-generative approaches, highlighting its unique features and benefits. The experiments show that Diff-Mix consistently outperforms other methods in various classification settings, including conventional, few-shot, and long-tail scenarios. The authors also analyze the impact of synthetic data size and diversity on classification performance, finding that background diversity is crucial for improving model generalization. Overall, Diff-Mix provides a promising approach for enhancing image classification by leveraging diffusion models for inter-class image interpolation.This article introduces Diff-Mix, a novel inter-class data augmentation method that leverages diffusion models to enhance domain-specific image classification. The authors address the limitations of existing generative and conventional data augmentation techniques, which often fail to produce images that are both faithful in terms of foreground objects and diverse in terms of background contexts. Diff-Mix improves upon these methods by performing inter-class image translation, which enriches the dataset with a greater diversity of samples. The method involves two key operations: personalized fine-tuning and inter-class image translation. Personalized fine-tuning is used to tailor the diffusion model to generate images with faithful foreground concepts, while inter-class image translation involves transforming a reference image into an edited version that incorporates prompts from different classes. This translation strategy retains the original background context while editing the foreground to align with the target concept. The authors demonstrate that Diff-Mix achieves a better balance between faithfulness and diversity, leading to a marked improvement in performance across diverse image classification scenarios, including few-shot, conventional, and long-tail classifications for domain-specific datasets. The method is compared with other distillation-based and intra-class augmentation methods, as well as non-generative approaches, highlighting its unique features and benefits. The experiments show that Diff-Mix consistently outperforms other methods in various classification settings, including conventional, few-shot, and long-tail scenarios. The authors also analyze the impact of synthetic data size and diversity on classification performance, finding that background diversity is crucial for improving model generalization. Overall, Diff-Mix provides a promising approach for enhancing image classification by leveraging diffusion models for inter-class image interpolation.
Reach us at info@study.space
Understanding Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model