CultureBank: An Online Community-Driven Knowledge Base Towards Culturally Aware Language Technologies

CultureBank: An Online Community-Driven Knowledge Base Towards Culturally Aware Language Technologies

23 Apr 2024 | Weiyang Shi, Ryan Li, Yutong Zhang, Caleb Ziem, Chunhua Yu, Raya Horesh, Rogério Abreu de Paula, Diyi Yang
CultureBank is an online community-driven knowledge base designed to enhance the cultural awareness of large language models (LLMs). The authors propose a generalizable pipeline to construct cultural knowledge bases from diverse online communities, using self-narratives from users with 12,000 cultural descriptors sourced from TikTok and 11,000 from Reddit. Unlike previous cultural knowledge resources, CultureBank includes diverse views on cultural descriptors and contextualized scenarios to enable grounded evaluation. The authors evaluate different LLMs' cultural awareness and identify areas for improvement. They also fine-tune a language model on CultureBank, showing improved performance on two downstream cultural tasks in a zero-shot setting. The paper also provides recommendations for future culturally aware language technologies. The authors highlight several limitations of their work, including the use of open-source LLMs that may not fully capture cultural nuances, potential sample bias in the dataset, and the presence of generic cultural statements. They also discuss the importance of diverse data sources, multiple cultural dimensions, and the need for temporal analysis in future research. The paper emphasizes the importance of grounded evaluation and the need for more inclusive and culturally aware language technologies. The authors conclude that CultureBank has the potential to improve the cultural awareness of LLMs and contribute to the development of more culturally aware language technologies.CultureBank is an online community-driven knowledge base designed to enhance the cultural awareness of large language models (LLMs). The authors propose a generalizable pipeline to construct cultural knowledge bases from diverse online communities, using self-narratives from users with 12,000 cultural descriptors sourced from TikTok and 11,000 from Reddit. Unlike previous cultural knowledge resources, CultureBank includes diverse views on cultural descriptors and contextualized scenarios to enable grounded evaluation. The authors evaluate different LLMs' cultural awareness and identify areas for improvement. They also fine-tune a language model on CultureBank, showing improved performance on two downstream cultural tasks in a zero-shot setting. The paper also provides recommendations for future culturally aware language technologies. The authors highlight several limitations of their work, including the use of open-source LLMs that may not fully capture cultural nuances, potential sample bias in the dataset, and the presence of generic cultural statements. They also discuss the importance of diverse data sources, multiple cultural dimensions, and the need for temporal analysis in future research. The paper emphasizes the importance of grounded evaluation and the need for more inclusive and culturally aware language technologies. The authors conclude that CultureBank has the potential to improve the cultural awareness of LLMs and contribute to the development of more culturally aware language technologies.
Reach us at info@study.space
[slides and audio] CultureBank%3A An Online Community-Driven Knowledge Base Towards Culturally Aware Language Technologies