The paper "Do Generated Data Always Help Contrastive Learning?" explores the impact of generated data on contrastive learning, a popular unsupervised visual representation learning paradigm. The authors investigate whether the use of high-quality generated images, particularly from diffusion models, can enhance contrastive learning. They find that while better generative models can improve performance, simply adding more generated data without proper adjustments can sometimes degrade performance. The study identifies two main sources of this failure: data inflation and data augmentation. For data inflation, they discover that stronger data inflation should be accompanied by weaker augmentations, and vice versa. They provide theoretical explanations for these phenomena and propose Adaptive Inflation (AdaInf), a strategy that adaptively adjusts the strength of data augmentation and the mixing ratio of real and generated data. AdaInf is shown to significantly improve performance on various benchmark datasets, especially in data-scarce scenarios, without introducing additional computational overhead. The paper also includes extensive experiments and theoretical analyses to support its findings.The paper "Do Generated Data Always Help Contrastive Learning?" explores the impact of generated data on contrastive learning, a popular unsupervised visual representation learning paradigm. The authors investigate whether the use of high-quality generated images, particularly from diffusion models, can enhance contrastive learning. They find that while better generative models can improve performance, simply adding more generated data without proper adjustments can sometimes degrade performance. The study identifies two main sources of this failure: data inflation and data augmentation. For data inflation, they discover that stronger data inflation should be accompanied by weaker augmentations, and vice versa. They provide theoretical explanations for these phenomena and propose Adaptive Inflation (AdaInf), a strategy that adaptively adjusts the strength of data augmentation and the mixing ratio of real and generated data. AdaInf is shown to significantly improve performance on various benchmark datasets, especially in data-scarce scenarios, without introducing additional computational overhead. The paper also includes extensive experiments and theoretical analyses to support its findings.