Understanding LLM2Vec%3A Large Language Models Are Secretly Powerful Text Encoders

**LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders** Large decoder-only language models (LLMs) are state-of-the-art on most NLP tasks, but their adoption for text embedding tasks has been slow. This work introduces LLM2Vec, a simple unsupervised approach to transform any decoder-only LLM into a strong text encoder. LLM2Vec consists of three steps: enabling bidirectional attention, masked next token prediction (MNTP), and unsupervised contrastive learning (SimCSE). The effectiveness of LLM2Vec is demonstrated through experiments on four popular LLMs (1.3B to 8B parameters) and evaluation on word- and sequence-level tasks. LLM2Vec outperforms encoder-only models on word-level tasks and achieves new state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB). Combining LLM2Vec with supervised contrastive learning further improves performance, achieving state-of-the-art results among models trained only on publicly available data. The analysis reveals that LLM2Vec helps models capture information from future tokens and explains the strong performance of Mistral-7B with bidirectional attention without training. LLM2Vec is a promising solution for low-resource and compute-constrained scenarios, offering a simple and efficient way to transform decoder-only LLMs into universal text encoders.**LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders** Large decoder-only language models (LLMs) are state-of-the-art on most NLP tasks, but their adoption for text embedding tasks has been slow. This work introduces LLM2Vec, a simple unsupervised approach to transform any decoder-only LLM into a strong text encoder. LLM2Vec consists of three steps: enabling bidirectional attention, masked next token prediction (MNTP), and unsupervised contrastive learning (SimCSE). The effectiveness of LLM2Vec is demonstrated through experiments on four popular LLMs (1.3B to 8B parameters) and evaluation on word- and sequence-level tasks. LLM2Vec outperforms encoder-only models on word-level tasks and achieves new state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB). Combining LLM2Vec with supervised contrastive learning further improves performance, achieving state-of-the-art results among models trained only on publicly available data. The analysis reveals that LLM2Vec helps models capture information from future tokens and explains the strong performance of Mistral-7B with bidirectional attention without training. LLM2Vec is a promising solution for low-resource and compute-constrained scenarios, offering a simple and efficient way to transform decoder-only LLMs into universal text encoders.

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

21 Aug 2024 | Parishad BehnamGhader, Vaibhav Adlakha, Marius Mosbach, Dzmitry Bahdanau, Nicolas Chapados, Siva Reddy