[slides and audio] Massively Multi-Cultural Knowledge Acquisition %26 LM Benchmarking

This paper addresses the challenges of cultural bias and lack of cultural commonsense knowledge in large language models (LLMs), which are crucial for cross-cultural communication and interactions. To tackle these issues, the authors introduce a novel approach for acquiring massively multicultural knowledge. They leverage Wikipedia documents on cultural topics to construct the CultureAtlas dataset, covering a wide range of sub-country geographical regions and ethnolinguistic groups. The dataset includes high-quality positive and negative samples, ensuring robustness in evaluating LLMs' cultural reasoning capabilities. The authors also propose methods for constructing negative samples and extracting fine-grained cultural profile information. The evaluation of state-of-the-art LLMs on CultureAtlas demonstrates the importance of cultural awareness and debiasing in LLMs. The work contributes to a more inclusive and balanced representation of global cultures in digital domains, fostering deeper understanding and bridging cultural disparities in AI.This paper addresses the challenges of cultural bias and lack of cultural commonsense knowledge in large language models (LLMs), which are crucial for cross-cultural communication and interactions. To tackle these issues, the authors introduce a novel approach for acquiring massively multicultural knowledge. They leverage Wikipedia documents on cultural topics to construct the CultureAtlas dataset, covering a wide range of sub-country geographical regions and ethnolinguistic groups. The dataset includes high-quality positive and negative samples, ensuring robustness in evaluating LLMs' cultural reasoning capabilities. The authors also propose methods for constructing negative samples and extracting fine-grained cultural profile information. The evaluation of state-of-the-art LLMs on CultureAtlas demonstrates the importance of cultural awareness and debiasing in LLMs. The work contributes to a more inclusive and balanced representation of global cultures in digital domains, fostering deeper understanding and bridging cultural disparities in AI.

No Culture Left Behind: Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking on 1000+ Sub-Country Regions and 2000+ Ethnolinguistic Groups

14 Feb 2024 | Yi R. Fung, Ruining Zhao, Jae Doo, Chenkai Sun, Heng Ji