Understanding LLMs Are Few-Shot In-Context Low-Resource Language Learners

This paper explores the effectiveness of in-context learning (ICL) and cross-lingual in-context learning (X-ICL) on low-resource languages, focusing on 25 low-resource and 7 relatively higher-resource languages. The study aims to address the challenges faced by large language models (LLMs) in generalizing across different languages, particularly for low-resource languages. The authors find that in-context label alignment, a common approach in X-ICL, is ineffective for most languages, while in-context query alignment, which aligns input distributions by providing semantically similar sentences, significantly improves performance. The paper also highlights the importance of few-shot in-context information in enhancing the understanding quality of LLMs through semantically relevant information, closing the language gap and aligning semantics between the target and high-resource languages. The study concludes with recommendations for improving LLMs' performance on low-resource languages, emphasizing the need for better alignment methods and cross-lingual retrieval techniques. The code for the experiments is publicly available at <https://github.com/SamuelCahyawijaya/in-context-alignment>.This paper explores the effectiveness of in-context learning (ICL) and cross-lingual in-context learning (X-ICL) on low-resource languages, focusing on 25 low-resource and 7 relatively higher-resource languages. The study aims to address the challenges faced by large language models (LLMs) in generalizing across different languages, particularly for low-resource languages. The authors find that in-context label alignment, a common approach in X-ICL, is ineffective for most languages, while in-context query alignment, which aligns input distributions by providing semantically similar sentences, significantly improves performance. The paper also highlights the importance of few-shot in-context information in enhancing the understanding quality of LLMs through semantically relevant information, closing the language gap and aligning semantics between the target and high-resource languages. The study concludes with recommendations for improving LLMs' performance on low-resource languages, emphasizing the need for better alignment methods and cross-lingual retrieval techniques. The code for the experiments is publicly available at <https://github.com/SamuelCahyawijaya/in-context-alignment>.

LLMs Are Few-Shot In-Context Low-Resource Language Learners

25 Jun 2024 | Samuel Cahyawijaya, Holy Lovenia, Pascale Fung