ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT

ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT

July 25–30, 2020 | Omar Khattab, Matei Zaharia
ColBERT is a novel ranking model that leverages contextualized late interaction over deep language models (specifically BERT) for efficient and effective passage search. The model introduces a *late interaction* architecture that independently encodes queries and documents using BERT, followed by a cheap yet powerful interaction step that models their fine-grained similarity. This approach allows ColBERT to exploit the expressiveness of deep LMs while significantly reducing computational costs. By delaying the fine-granular interaction, ColBERT can pre-compute document representations offline, speeding up query processing. Additionally, ColBERT's pruning-friendly interaction mechanism enables leveraging vector-similarity indexes for end-to-end retrieval from large document collections. Extensive evaluations on two recent passage search datasets, MS MARCO and TREC CAR, demonstrate that ColBERT is highly effective, outperforming existing BERT-based models and non-BERT baselines, while being significantly faster and requiring fewer FLOPs per query.ColBERT is a novel ranking model that leverages contextualized late interaction over deep language models (specifically BERT) for efficient and effective passage search. The model introduces a *late interaction* architecture that independently encodes queries and documents using BERT, followed by a cheap yet powerful interaction step that models their fine-grained similarity. This approach allows ColBERT to exploit the expressiveness of deep LMs while significantly reducing computational costs. By delaying the fine-granular interaction, ColBERT can pre-compute document representations offline, speeding up query processing. Additionally, ColBERT's pruning-friendly interaction mechanism enables leveraging vector-similarity indexes for end-to-end retrieval from large document collections. Extensive evaluations on two recent passage search datasets, MS MARCO and TREC CAR, demonstrate that ColBERT is highly effective, outperforming existing BERT-based models and non-BERT baselines, while being significantly faster and requiring fewer FLOPs per query.
Reach us at info@study.space