[slides and audio] NV-Retriever%3A Improving text embedding models with effective hard-negative mining

This paper introduces NV-Retriever-v1, a state-of-the-art text embedding model designed for information retrieval applications such as semantic search and Question-Answering systems based on Retrieval-Augmented Generation (RAG). The authors highlight the importance of effective hard-negative mining in the fine-tuning process of text embedding models, which is often under explored in the literature. They propose a family of positive-aware mining methods that leverage the positive relevance score to better remove false negatives, improving the accuracy of the fine-tuned models. The paper includes a comprehensive ablation study comparing different hard-negative mining methods, teacher models, and ensemble techniques. The results show that the proposed positive-aware mining methods significantly improve the performance of text embedding models. NV-Retriever-v1, trained using these methods, achieved an average NDCG@10 score of 0.9 on the MTEB Retrieval benchmark, placing first on the leaderboard at its publication. The key contributions of the paper are: 1. Positive-aware hard-negative mining methods that leverage the positive relevance score to remove false negatives. 2. A detailed ablation study on different hard-negative mining methods and their configurations. 3. The release of NV-Retriever-v1, a state-of-the-art text retrieval model. The authors recommend practitioners experiment with different configurations of the proposed methods to find the best setup for their fine-tuning and evaluation tasks. They also encourage future research to disclose their mining methodologies for reproducibility and replicability.This paper introduces NV-Retriever-v1, a state-of-the-art text embedding model designed for information retrieval applications such as semantic search and Question-Answering systems based on Retrieval-Augmented Generation (RAG). The authors highlight the importance of effective hard-negative mining in the fine-tuning process of text embedding models, which is often under explored in the literature. They propose a family of positive-aware mining methods that leverage the positive relevance score to better remove false negatives, improving the accuracy of the fine-tuned models. The paper includes a comprehensive ablation study comparing different hard-negative mining methods, teacher models, and ensemble techniques. The results show that the proposed positive-aware mining methods significantly improve the performance of text embedding models. NV-Retriever-v1, trained using these methods, achieved an average NDCG@10 score of 0.9 on the MTEB Retrieval benchmark, placing first on the leaderboard at its publication. The key contributions of the paper are: 1. Positive-aware hard-negative mining methods that leverage the positive relevance score to remove false negatives. 2. A detailed ablation study on different hard-negative mining methods and their configurations. 3. The release of NV-Retriever-v1, a state-of-the-art text retrieval model. The authors recommend practitioners experiment with different configurations of the proposed methods to find the best setup for their fine-tuning and evaluation tasks. They also encourage future research to disclose their mining methodologies for reproducibility and replicability.

NV-Retriever: Improving text embedding models with effective hard-negative mining

22 Jul 2024 | Gabriel de Souza P. Moreira, Radek Osmulski, Mengyao Xu, Ronay Ak, Benedikt Schifferer, Even Oldridge