Beyond Aesthetics: Cultural Competence in Text-to-Image Models

Beyond Aesthetics: Cultural Competence in Text-to-Image Models

20 Jan 2025 | Nithish Kannen, Arif Ahmad, Marco Andreetto, Vinodkumar Prabhakaran, Utsav Prabhu, Adji Boussou Dieng, Pushpak Bhattacharyya, Shachi Dave
The paper "Beyond Aesthetics: Cultural Competence in Text-to-Image Models" addresses the critical issue of cultural competence in Text-to-Image (T2I) models, which are increasingly used to create visual representations of diverse global cultures. The authors introduce CUBE (CUltural BEenchmark for Text-to-Image models), a novel benchmark that evaluates T2I models along two key dimensions: cultural awareness and cultural diversity. CUBE consists of two main components: 1. **CUBE-1K**: A curated subset of 1000 high-quality prompts designed to evaluate cultural awareness. 2. **CUBE-CSpace**: A large dataset of approximately 300,000 cultural artifacts covering eight countries and three concepts (cuisine, landmarks, and art). The authors also introduce a new evaluation component, Cultural Diversity (CD), which leverages the quality-weighted Vendi score to assess the diversity of generated cultural artifacts. The evaluation methods involve human annotation to measure faithfulness and realism, as well as automated metrics for diversity. Key findings from the evaluations reveal significant gaps in cultural awareness and diversity across different countries and concepts. The study highlights the need for more inclusive and culturally competent T2I models, particularly in representing the Global South. The authors discuss the limitations of their approach, such as the inherent biases in existing knowledge bases and the challenges of human annotation, and emphasize the importance of community-based and participatory approaches to enrich cultural representation. Overall, the paper contributes to the ongoing dialogue on developing truly inclusive generative AI systems by providing a comprehensive framework and metrics for evaluating cultural competence in T2I models.The paper "Beyond Aesthetics: Cultural Competence in Text-to-Image Models" addresses the critical issue of cultural competence in Text-to-Image (T2I) models, which are increasingly used to create visual representations of diverse global cultures. The authors introduce CUBE (CUltural BEenchmark for Text-to-Image models), a novel benchmark that evaluates T2I models along two key dimensions: cultural awareness and cultural diversity. CUBE consists of two main components: 1. **CUBE-1K**: A curated subset of 1000 high-quality prompts designed to evaluate cultural awareness. 2. **CUBE-CSpace**: A large dataset of approximately 300,000 cultural artifacts covering eight countries and three concepts (cuisine, landmarks, and art). The authors also introduce a new evaluation component, Cultural Diversity (CD), which leverages the quality-weighted Vendi score to assess the diversity of generated cultural artifacts. The evaluation methods involve human annotation to measure faithfulness and realism, as well as automated metrics for diversity. Key findings from the evaluations reveal significant gaps in cultural awareness and diversity across different countries and concepts. The study highlights the need for more inclusive and culturally competent T2I models, particularly in representing the Global South. The authors discuss the limitations of their approach, such as the inherent biases in existing knowledge bases and the challenges of human annotation, and emphasize the importance of community-based and participatory approaches to enrich cultural representation. Overall, the paper contributes to the ongoing dialogue on developing truly inclusive generative AI systems by providing a comprehensive framework and metrics for evaluating cultural competence in T2I models.
Reach us at info@study.space