NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

27 May 2024 | Chanky Lee, Rajarshi Roy, Mengyao Xu, Jonathan Raiman, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping
NV-Embed is a generalist embedding model that significantly improves the performance of decoder-only large language models (LLMs) in embedding and retrieval tasks. The model introduces a latent attention layer to obtain pooled embeddings, which consistently improves retrieval and downstream task accuracy compared to mean pooling or using the last <EOS> token embedding. It also removes the causal attention mask during contrastive training, enhancing representation learning. For training, a two-stage contrastive instruction-tuning method is introduced. The first stage applies contrastive training with instructions on retrieval datasets using in-batch negatives and curated hard negatives. The second stage blends non-retrieval datasets into instruction tuning, improving non-retrieval task accuracy and retrieval performance. NV-Embed achieves a record-high score of 69.32 on the Massive Text Embedding Benchmark (MTEB), ranking first among 56 tasks, and a high score of 59.36 on 15 retrieval tasks. The model is trained using publicly available data, without relying on proprietary synthetic data. It outperforms previous leading models like E5-mistral-7b-instruct, SFR-Embedding, and Voyage-large-2-instruct. The model's architecture eliminates unnecessary causal attention masks and improves sequence pooling with a latent attention layer. The two-stage training method enhances performance across retrieval, classification, clustering, and semantic textual similarity tasks. NV-Embed demonstrates state-of-the-art results using only publicly available data, without any synthetic data from proprietary LLMs.NV-Embed is a generalist embedding model that significantly improves the performance of decoder-only large language models (LLMs) in embedding and retrieval tasks. The model introduces a latent attention layer to obtain pooled embeddings, which consistently improves retrieval and downstream task accuracy compared to mean pooling or using the last <EOS> token embedding. It also removes the causal attention mask during contrastive training, enhancing representation learning. For training, a two-stage contrastive instruction-tuning method is introduced. The first stage applies contrastive training with instructions on retrieval datasets using in-batch negatives and curated hard negatives. The second stage blends non-retrieval datasets into instruction tuning, improving non-retrieval task accuracy and retrieval performance. NV-Embed achieves a record-high score of 69.32 on the Massive Text Embedding Benchmark (MTEB), ranking first among 56 tasks, and a high score of 59.36 on 15 retrieval tasks. The model is trained using publicly available data, without relying on proprietary synthetic data. It outperforms previous leading models like E5-mistral-7b-instruct, SFR-Embedding, and Voyage-large-2-instruct. The model's architecture eliminates unnecessary causal attention masks and improves sequence pooling with a latent attention layer. The two-stage training method enhances performance across retrieval, classification, clustering, and semantic textual similarity tasks. NV-Embed demonstrates state-of-the-art results using only publicly available data, without any synthetic data from proprietary LLMs.
Reach us at info@study.space
[slides] NV-Embed%3A Improved Techniques for Training LLMs as Generalist Embedding Models | StudySpace