19 Jun 2024 | Muhammad Farid Adilazuarda1*, Sagnik Mukherjee1*, Pradhyumna Lavanaia2, Siddhant Singh2, Ashutosh Dwivedi2, Alham Fikri Aji1, Jacki O'Neill3, Ashutosh Modi2, Monojit Choudhury1
This survey paper examines the state of research on cultural representation and inclusion in large language models (LLMs). The authors analyze over 90 recent papers and find that none explicitly define "culture," a complex and multifaceted concept. Instead, these studies probe models using datasets that represent various aspects of culture, which they call *proxies of culture*. These proxies are categorized into demographic and semantic dimensions. The paper also categorizes the probing methods used and highlights gaps in the research, such as the lack of robustness in probing techniques and situated studies on the impact of cultural mis- and underrepresentation in LLM-based applications. The authors recommend explicit definitions of culture, expanded exploration of cultural proxies, more robust and interpretable methods, multilingual datasets, situated studies, and interdisciplinary approaches to fully understand the relationship between culture and technology.This survey paper examines the state of research on cultural representation and inclusion in large language models (LLMs). The authors analyze over 90 recent papers and find that none explicitly define "culture," a complex and multifaceted concept. Instead, these studies probe models using datasets that represent various aspects of culture, which they call *proxies of culture*. These proxies are categorized into demographic and semantic dimensions. The paper also categorizes the probing methods used and highlights gaps in the research, such as the lack of robustness in probing techniques and situated studies on the impact of cultural mis- and underrepresentation in LLM-based applications. The authors recommend explicit definitions of culture, expanded exploration of cultural proxies, more robust and interpretable methods, multilingual datasets, situated studies, and interdisciplinary approaches to fully understand the relationship between culture and technology.