2024-07-02 | Manuel Faysse*, 1, 3 Hugues Sibille*, 1, 4 Tony Wu*, 1 Bilel Omrani1 Gautier Viaud1 Céline Hudelot3 Pierre Colombo2, 3
The paper introduces ColPali, a novel retrieval model that leverages Vision Language Models (VLMs) to efficiently index and retrieve visually rich documents. The authors address the limitations of current document retrieval systems, which struggle to utilize visual cues effectively, leading to suboptimal performance in practical applications such as Retrieval Augmented Generation (RAG). To benchmark these systems, they create the Visual Document Retrieval Benchmark (ViDoRe), which includes various page-level tasks across multiple domains, languages, and settings.
ColPali is designed to produce high-quality contextualized embeddings from document page images, combining the strengths of VLMs and late interaction mechanisms. This approach significantly outperforms existing document retrieval pipelines in terms of performance, speed, and end-to-end trainability. The authors release all project artifacts, including the benchmark and models, to encourage further research and development in document retrieval. The paper also discusses the design and evaluation of ViDoRe, the challenges faced by current systems, and the contributions of ColPali in addressing these issues.The paper introduces ColPali, a novel retrieval model that leverages Vision Language Models (VLMs) to efficiently index and retrieve visually rich documents. The authors address the limitations of current document retrieval systems, which struggle to utilize visual cues effectively, leading to suboptimal performance in practical applications such as Retrieval Augmented Generation (RAG). To benchmark these systems, they create the Visual Document Retrieval Benchmark (ViDoRe), which includes various page-level tasks across multiple domains, languages, and settings.
ColPali is designed to produce high-quality contextualized embeddings from document page images, combining the strengths of VLMs and late interaction mechanisms. This approach significantly outperforms existing document retrieval pipelines in terms of performance, speed, and end-to-end trainability. The authors release all project artifacts, including the benchmark and models, to encourage further research and development in document retrieval. The paper also discusses the design and evaluation of ViDoRe, the challenges faced by current systems, and the contributions of ColPali in addressing these issues.