SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation

SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation

16 Jan 2024 | Zhixuan Liu, Peter Schaldenbrand, Beverley-Claire Okogwu, Wenxuan Peng, Youngsik Yun, Andrew Hundt, Jihie Kim, Jean Oh
The paper "SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation" addresses the issue of harmful stereotypes and misrepresentations in images generated by large web-crawled datasets like LAION. The authors propose a novel method called Self-Contrastive Fine-Tuning (SCoFT) to improve the cultural relevance and reduce stereotypes in generated images. They collect a culturally representative dataset called the Cross-Cultural Understanding Benchmark (CCUB) and use it to fine-tune the Stable Diffusion model. SCoFT leverages the model's known biases to self-improve, preventing overfitting on small datasets and encoding high-level cultural information. User studies with 51 participants from 5 different countries show that SCoFT consistently generates images with higher cultural relevance and fewer stereotypes compared to the baseline Stable Diffusion model. The paper also introduces the CCUB dataset and discusses related work, including cultural datasets, fine-tuning techniques, and perceptual losses. The results demonstrate the effectiveness of SCoFT in improving the cultural representation of generated images.The paper "SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation" addresses the issue of harmful stereotypes and misrepresentations in images generated by large web-crawled datasets like LAION. The authors propose a novel method called Self-Contrastive Fine-Tuning (SCoFT) to improve the cultural relevance and reduce stereotypes in generated images. They collect a culturally representative dataset called the Cross-Cultural Understanding Benchmark (CCUB) and use it to fine-tune the Stable Diffusion model. SCoFT leverages the model's known biases to self-improve, preventing overfitting on small datasets and encoding high-level cultural information. User studies with 51 participants from 5 different countries show that SCoFT consistently generates images with higher cultural relevance and fewer stereotypes compared to the baseline Stable Diffusion model. The paper also introduces the CCUB dataset and discusses related work, including cultural datasets, fine-tuning techniques, and perceptual losses. The results demonstrate the effectiveness of SCoFT in improving the cultural representation of generated images.
Reach us at info@study.space