Understanding Investigating Cultural Alignment of Large Language Models

This paper investigates the cultural alignment of Large Language Models (LLMs) by simulating sociological surveys and comparing model responses to actual survey participants' responses. The study reveals that LLMs demonstrate greater cultural alignment when prompted with the dominant language of a specific culture and when pre-trained with a refined mixture of languages used by that culture. The authors quantify cultural alignment by simulating surveys conducted in Egypt and the United States, using different pre-training data mixtures in Arabic and English. They find that misalignment is more pronounced for underrepresented personas and culturally sensitive topics. To enhance cultural alignment, the authors introduce Anthropological Prompting, a method that leverages anthropological reasoning to guide the model's responses. The study emphasizes the need for a more balanced multilingual pre-training dataset to better represent the diversity of human experiences and cultures, with implications for cross-lingual transfer. The research highlights the significant role of language in cultural alignment and the importance of addressing biases in LLMs.This paper investigates the cultural alignment of Large Language Models (LLMs) by simulating sociological surveys and comparing model responses to actual survey participants' responses. The study reveals that LLMs demonstrate greater cultural alignment when prompted with the dominant language of a specific culture and when pre-trained with a refined mixture of languages used by that culture. The authors quantify cultural alignment by simulating surveys conducted in Egypt and the United States, using different pre-training data mixtures in Arabic and English. They find that misalignment is more pronounced for underrepresented personas and culturally sensitive topics. To enhance cultural alignment, the authors introduce Anthropological Prompting, a method that leverages anthropological reasoning to guide the model's responses. The study emphasizes the need for a more balanced multilingual pre-training dataset to better represent the diversity of human experiences and cultures, with implications for cross-lingual transfer. The research highlights the significant role of language in cultural alignment and the importance of addressing biases in LLMs.

Investigating Cultural Alignment of Large Language Models

6 Jul 2024 | Badr AlKhamissi, Muhammad ElNokrashy, Mai AlKhamissi, Mona Diab