This paper investigates retrieval-augmented generation (RAG) in multilingual settings (mRAG), focusing on building a strong baseline for future research. The study considers user queries and datastores in 13 languages and explores which components and adjustments are needed to build a well-performing mRAG pipeline. The findings highlight that despite the availability of high-quality multilingual retrievers and generators, task-specific prompt engineering is needed to enable generation in user languages. Additionally, current evaluation metrics need adjustments for multilingual settings to account for variations in spelling of named entities. The main limitations include frequent code-switching in non-Latin alphabet languages, occasional fluency errors, wrong reading of documents, or irrelevant retrieval. The authors release the code for the resulting mRAG baseline pipeline at https://github.com/naver/bergen.
The paper discusses the RAG pipeline, which involves retrieval and generation. Retrieval involves finding relevant context from the Internet or a collection, while generation involves using the retrieved context to generate a response. The study evaluates the performance of mRAG across different languages and finds that RAG brings substantial performance improvements in all languages, with retrieval from multilingual Wikipedia being beneficial in most cases. The evaluation metrics need adjustment to account for zero-shot scenarios, such as variations in spelling of named entities in cross-lingual settings. The main limitations to be addressed in future work include frequent code-switching in non-Latin alphabet languages, occasional fluency errors, wrong reading of the provided documents, or irrelevant retrieval.
The paper also discusses related work, including multilingual open question answering, and highlights the importance of considering multilingual settings in RAG experiments. The study builds on the BERGEN benchmarking library for RAG and aims to answer research questions about the performance of RAG in non-English languages, the components needed for effective mRAG, and the limitations of existing components. The key findings include the importance of strong multilingual retrievers and generators, the need for advanced prompting strategies, and the need for adjustments to evaluation metrics. The paper concludes that mRAG has clear advantages for both English and non-English speakers and that future research should focus on improving multilingual LLMs and decoding strategies, as well as developing multi-domain multilingual retrieval systems.This paper investigates retrieval-augmented generation (RAG) in multilingual settings (mRAG), focusing on building a strong baseline for future research. The study considers user queries and datastores in 13 languages and explores which components and adjustments are needed to build a well-performing mRAG pipeline. The findings highlight that despite the availability of high-quality multilingual retrievers and generators, task-specific prompt engineering is needed to enable generation in user languages. Additionally, current evaluation metrics need adjustments for multilingual settings to account for variations in spelling of named entities. The main limitations include frequent code-switching in non-Latin alphabet languages, occasional fluency errors, wrong reading of documents, or irrelevant retrieval. The authors release the code for the resulting mRAG baseline pipeline at https://github.com/naver/bergen.
The paper discusses the RAG pipeline, which involves retrieval and generation. Retrieval involves finding relevant context from the Internet or a collection, while generation involves using the retrieved context to generate a response. The study evaluates the performance of mRAG across different languages and finds that RAG brings substantial performance improvements in all languages, with retrieval from multilingual Wikipedia being beneficial in most cases. The evaluation metrics need adjustment to account for zero-shot scenarios, such as variations in spelling of named entities in cross-lingual settings. The main limitations to be addressed in future work include frequent code-switching in non-Latin alphabet languages, occasional fluency errors, wrong reading of the provided documents, or irrelevant retrieval.
The paper also discusses related work, including multilingual open question answering, and highlights the importance of considering multilingual settings in RAG experiments. The study builds on the BERGEN benchmarking library for RAG and aims to answer research questions about the performance of RAG in non-English languages, the components needed for effective mRAG, and the limitations of existing components. The key findings include the importance of strong multilingual retrievers and generators, the need for advanced prompting strategies, and the need for adjustments to evaluation metrics. The paper concludes that mRAG has clear advantages for both English and non-English speakers and that future research should focus on improving multilingual LLMs and decoding strategies, as well as developing multi-domain multilingual retrieval systems.