6 Jun 2024 | Jiyoung Lee, Minwoo Kim, Seungho Kim, Junghwan Kim, Seunghyun Won, Hwaran Lee, Edward Choi
KorNAT is a benchmark for evaluating the alignment of large language models (LLMs) with South Korean social values and common knowledge. The benchmark includes 4,000 multiple-choice questions for social values and 6,000 for common knowledge. The social value dataset was created using survey responses from 6,174 Korean participants, while the common knowledge dataset was based on Korean textbooks and GED reference materials. The dataset was meticulously curated using survey theory and underwent multiple rounds of human review. KorNAT has been approved by the government after passing an assessment by the Telecommunications Technology Association of Korea (TTA). The benchmark includes metrics to measure national alignment with three variations of social value alignment. The dataset is designed to reflect the general opinions of the Korean population and integrate nation-specific common knowledge. The benchmark was tested on seven LLMs, with only a few models meeting the reference score, indicating a potential for further improvement. The dataset is available on Hugging Face, and a public leaderboard is planned for June 2024. The benchmark aims to improve the alignment of LLMs with the targeted country from both social values and common knowledge perspectives. The dataset is designed to reflect the diverse social values of the Korean population across different genders and ages. The benchmark is expected to contribute to improving national alignment between LLMs and the targeted countries, and to foster a more inclusive understanding and appreciation of diverse national characteristics. The dataset is also expected to be used by researchers to create their own national alignment datasets, taking into account the unique and regional characteristics specific to Korea. The benchmark is designed to be generalizable and can be adapted to any other nations and time periods. The dataset is expected to be effective in reflecting the diverse social values of the Korean population across different genders and ages. The benchmark is also expected to be used by researchers to create their own national alignment datasets, taking into account the unique and regional characteristics specific to Korea. The benchmark is designed to be generalizable and can be adapted to any other nations and time periods.KorNAT is a benchmark for evaluating the alignment of large language models (LLMs) with South Korean social values and common knowledge. The benchmark includes 4,000 multiple-choice questions for social values and 6,000 for common knowledge. The social value dataset was created using survey responses from 6,174 Korean participants, while the common knowledge dataset was based on Korean textbooks and GED reference materials. The dataset was meticulously curated using survey theory and underwent multiple rounds of human review. KorNAT has been approved by the government after passing an assessment by the Telecommunications Technology Association of Korea (TTA). The benchmark includes metrics to measure national alignment with three variations of social value alignment. The dataset is designed to reflect the general opinions of the Korean population and integrate nation-specific common knowledge. The benchmark was tested on seven LLMs, with only a few models meeting the reference score, indicating a potential for further improvement. The dataset is available on Hugging Face, and a public leaderboard is planned for June 2024. The benchmark aims to improve the alignment of LLMs with the targeted country from both social values and common knowledge perspectives. The dataset is designed to reflect the diverse social values of the Korean population across different genders and ages. The benchmark is expected to contribute to improving national alignment between LLMs and the targeted countries, and to foster a more inclusive understanding and appreciation of diverse national characteristics. The dataset is also expected to be used by researchers to create their own national alignment datasets, taking into account the unique and regional characteristics specific to Korea. The benchmark is designed to be generalizable and can be adapted to any other nations and time periods. The dataset is expected to be effective in reflecting the diverse social values of the Korean population across different genders and ages. The benchmark is also expected to be used by researchers to create their own national alignment datasets, taking into account the unique and regional characteristics specific to Korea. The benchmark is designed to be generalizable and can be adapted to any other nations and time periods.