20 Oct 2020 | Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid Ahmed, Arnold Overwijk
This paper presents Approximate Nearest Neighbor Negative Contrastive Learning (ANCE), a novel method for dense text retrieval that addresses the limitations of traditional sparse retrieval. Dense retrieval (DR) aims to improve retrieval performance by learning dense representations of text, but it often underperforms sparse methods like BM205. The key challenge in DR is selecting effective negative samples during training, as local negatives within a batch are not informative and lead to slow convergence. ANCE addresses this by selecting global negatives from the entire corpus using an asynchronously updated approximate nearest neighbor (ANN) index, which allows for more effective learning.
The paper theoretically analyzes the convergence of dense retrieval training, showing that local negatives lead to diminishing gradient norms and high variance in stochastic gradients, which hinder learning. ANCE constructs global negatives by using the current DR model to retrieve from the entire corpus, aligning the distribution of negative samples with irrelevant documents. This approach improves training convergence and leads to better performance in various retrieval tasks, including web search, question answering, and commercial search environments.
Experiments demonstrate that ANCE achieves performance comparable to BERT-based cascade IR pipelines while being 100x more efficient. The method is implemented using an asynchronously updated ANN index, with an Inferencer that parallelly computes document encodings using a recent checkpoint from the DR model. This setup allows for efficient training and retrieval, with ANCE negatives showing significantly larger gradient norms than local negatives, leading to faster convergence.
The paper also discusses the effectiveness of ANCE in different retrieval scenarios, including the first stage retrieval of a commercial search engine. Results show that ANCE outperforms traditional methods in terms of retrieval accuracy and efficiency, with significant gains in various corpus sizes and search settings. The study highlights the importance of global negatives in training dense retrieval models and shows that ANCE can effectively capture the nuances of search relevance, leading to improved performance in real-world applications.This paper presents Approximate Nearest Neighbor Negative Contrastive Learning (ANCE), a novel method for dense text retrieval that addresses the limitations of traditional sparse retrieval. Dense retrieval (DR) aims to improve retrieval performance by learning dense representations of text, but it often underperforms sparse methods like BM205. The key challenge in DR is selecting effective negative samples during training, as local negatives within a batch are not informative and lead to slow convergence. ANCE addresses this by selecting global negatives from the entire corpus using an asynchronously updated approximate nearest neighbor (ANN) index, which allows for more effective learning.
The paper theoretically analyzes the convergence of dense retrieval training, showing that local negatives lead to diminishing gradient norms and high variance in stochastic gradients, which hinder learning. ANCE constructs global negatives by using the current DR model to retrieve from the entire corpus, aligning the distribution of negative samples with irrelevant documents. This approach improves training convergence and leads to better performance in various retrieval tasks, including web search, question answering, and commercial search environments.
Experiments demonstrate that ANCE achieves performance comparable to BERT-based cascade IR pipelines while being 100x more efficient. The method is implemented using an asynchronously updated ANN index, with an Inferencer that parallelly computes document encodings using a recent checkpoint from the DR model. This setup allows for efficient training and retrieval, with ANCE negatives showing significantly larger gradient norms than local negatives, leading to faster convergence.
The paper also discusses the effectiveness of ANCE in different retrieval scenarios, including the first stage retrieval of a commercial search engine. Results show that ANCE outperforms traditional methods in terms of retrieval accuracy and efficiency, with significant gains in various corpus sizes and search settings. The study highlights the importance of global negatives in training dense retrieval models and shows that ANCE can effectively capture the nuances of search relevance, leading to improved performance in real-world applications.