6 Jul 2024 | Badr AlKhamissi, Muhammad ElNokrashy, Mai AlKhamissi, Mona Diab
This study investigates the cultural alignment of Large Language Models (LLMs) by simulating sociological surveys to compare model responses with those of actual survey participants. The research highlights that LLMs demonstrate greater cultural alignment when prompted with the dominant language of a specific culture or when pretrained with a refined mixture of languages used by that culture. Cultural alignment is quantified by simulating surveys in Egypt and the United States, comparing model responses to actual survey answers. The study finds that misalignment increases for underrepresented personas and culturally sensitive topics, such as social values. It introduces Anthropological Prompting, a novel method that leverages anthropological reasoning to enhance cultural alignment. The research emphasizes the need for a more balanced multilingual pretraining dataset to better represent cultural diversity and has implications for cross-lingual transfer. The study also shows that LLMs exhibit anglocentric bias, with greater alignment to Western cultures. The results indicate that prompting with the dominant language of a country improves cultural alignment, while pretraining with a balanced language mixture reduces anglocentric bias. Anthropological Prompting improves cultural alignment for underrepresented groups, demonstrating its effectiveness in enhancing cultural alignment in LLMs. The study concludes that cultural alignment is influenced by pretraining language composition, prompting language, and demographic factors, and that further research is needed to address biases and improve cross-lingual transfer.This study investigates the cultural alignment of Large Language Models (LLMs) by simulating sociological surveys to compare model responses with those of actual survey participants. The research highlights that LLMs demonstrate greater cultural alignment when prompted with the dominant language of a specific culture or when pretrained with a refined mixture of languages used by that culture. Cultural alignment is quantified by simulating surveys in Egypt and the United States, comparing model responses to actual survey answers. The study finds that misalignment increases for underrepresented personas and culturally sensitive topics, such as social values. It introduces Anthropological Prompting, a novel method that leverages anthropological reasoning to enhance cultural alignment. The research emphasizes the need for a more balanced multilingual pretraining dataset to better represent cultural diversity and has implications for cross-lingual transfer. The study also shows that LLMs exhibit anglocentric bias, with greater alignment to Western cultures. The results indicate that prompting with the dominant language of a country improves cultural alignment, while pretraining with a balanced language mixture reduces anglocentric bias. Anthropological Prompting improves cultural alignment for underrepresented groups, demonstrating its effectiveness in enhancing cultural alignment in LLMs. The study concludes that cultural alignment is influenced by pretraining language composition, prompting language, and demographic factors, and that further research is needed to address biases and improve cross-lingual transfer.