28 Mar 2024 | Zhicai Wang, Longhui Wei, Tan Wang, Heyu Chen, Yanbin Hao, Xiang Wang, Xiangnan He, Qi Tian
This paper proposes Diff-Mix, a novel inter-class image mixup method that leverages diffusion models to enhance image classification performance. The method addresses the limitations of existing data augmentation techniques by generating images that are both faithful to foreground objects and diverse in background contexts. Diff-Mix performs inter-class image translation, enabling the generation of diverse samples that improve classification accuracy across various scenarios, including few-shot, conventional, and long-tail classifications.
The paper analyzes the shortcomings of current generative and conventional data augmentation techniques, which often fail to produce images that are both faithful and diverse for domain-specific concepts. Diff-Mix introduces a new approach that enhances the dataset by performing image translations between classes, leading to a better balance between faithfulness and diversity. The method includes two key operations: personalized fine-tuning and inter-class image translation. Personalized fine-tuning adapts the diffusion model to generate images with faithful foreground concepts, while inter-class image translation transforms a reference image into an edited version that incorporates prompts from different classes, retaining the original background context while editing the foreground.
The paper compares Diff-Mix with other data augmentation methods, including intra-class augmentation and non-generative approaches, and demonstrates its effectiveness in improving classification performance. Experiments on various datasets, including Caltech-UCSD Birds (CUB), Stanford Cars, Oxford Flowers, Stanford Dogs, and FGVC Aircraft, show that Diff-Mix consistently outperforms existing methods, particularly in scenarios with limited data. The method also performs well in long-tail classification tasks, where it enhances the diversity of synthetic data, leading to improved performance.
The paper also discusses the impact of synthetic data size and diversity on classification performance, showing that increasing the diversity of the synthetic data leads to better results. Additionally, the study highlights the importance of fine-tuning strategies and the non-linear nature of diffusion translation in achieving effective inter-class augmentation. Overall, Diff-Mix provides a promising approach for enhancing image classification by leveraging diffusion models to generate diverse and faithful synthetic images.This paper proposes Diff-Mix, a novel inter-class image mixup method that leverages diffusion models to enhance image classification performance. The method addresses the limitations of existing data augmentation techniques by generating images that are both faithful to foreground objects and diverse in background contexts. Diff-Mix performs inter-class image translation, enabling the generation of diverse samples that improve classification accuracy across various scenarios, including few-shot, conventional, and long-tail classifications.
The paper analyzes the shortcomings of current generative and conventional data augmentation techniques, which often fail to produce images that are both faithful and diverse for domain-specific concepts. Diff-Mix introduces a new approach that enhances the dataset by performing image translations between classes, leading to a better balance between faithfulness and diversity. The method includes two key operations: personalized fine-tuning and inter-class image translation. Personalized fine-tuning adapts the diffusion model to generate images with faithful foreground concepts, while inter-class image translation transforms a reference image into an edited version that incorporates prompts from different classes, retaining the original background context while editing the foreground.
The paper compares Diff-Mix with other data augmentation methods, including intra-class augmentation and non-generative approaches, and demonstrates its effectiveness in improving classification performance. Experiments on various datasets, including Caltech-UCSD Birds (CUB), Stanford Cars, Oxford Flowers, Stanford Dogs, and FGVC Aircraft, show that Diff-Mix consistently outperforms existing methods, particularly in scenarios with limited data. The method also performs well in long-tail classification tasks, where it enhances the diversity of synthetic data, leading to improved performance.
The paper also discusses the impact of synthetic data size and diversity on classification performance, showing that increasing the diversity of the synthetic data leads to better results. Additionally, the study highlights the importance of fine-tuning strategies and the non-linear nature of diffusion translation in achieving effective inter-class augmentation. Overall, Diff-Mix provides a promising approach for enhancing image classification by leveraging diffusion models to generate diverse and faithful synthetic images.