20 Oct 2020 | Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid Ahmed, Arnold Overwijk
This paper addresses the challenges of dense text retrieval, where end-to-end learned dense retrieval (DR) often underperforms word-based sparse retrieval. The authors theoretically demonstrate that the learning bottleneck in dense retrieval is due to the dominance of uninformative negatives sampled locally in batches, leading to diminishing gradient norms, large stochastic gradient variances, and slow learning convergence. To overcome this, they propose Approximate Nearest Neighbor Negative Contrastive Learning (ANCE), a mechanism that selects hard training negatives globally from the entire corpus using an asynchronously updated ANN index. Experiments on web search, question answering, and a commercial search engine show that ANCE significantly improves retrieval accuracy, nearly matching the performance of BERT-based cascade IR pipelines while being 100 times more efficient. The paper also provides a detailed analysis of the convergence of dense retrieval training and empirical validation of the effectiveness of ANCE.This paper addresses the challenges of dense text retrieval, where end-to-end learned dense retrieval (DR) often underperforms word-based sparse retrieval. The authors theoretically demonstrate that the learning bottleneck in dense retrieval is due to the dominance of uninformative negatives sampled locally in batches, leading to diminishing gradient norms, large stochastic gradient variances, and slow learning convergence. To overcome this, they propose Approximate Nearest Neighbor Negative Contrastive Learning (ANCE), a mechanism that selects hard training negatives globally from the entire corpus using an asynchronously updated ANN index. Experiments on web search, question answering, and a commercial search engine show that ANCE significantly improves retrieval accuracy, nearly matching the performance of BERT-based cascade IR pipelines while being 100 times more efficient. The paper also provides a detailed analysis of the convergence of dense retrieval training and empirical validation of the effectiveness of ANCE.