LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

2024 | Parishad BehnamGhader, Vaibhav Adlakha, Marius Mosbach, Dzmitry Bahdanau, Nicolas Chapados, Siva Reddy
LLM2Vec is a simple unsupervised method that transforms decoder-only large language models (LLMs) into strong text encoders. The approach involves three steps: enabling bidirectional attention, masked next token prediction, and unsupervised contrastive learning. By applying these steps, LLM2Vec effectively enhances the contextualized representations of decoder-only LLMs, making them suitable for text embedding tasks. The method is data- and parameter-efficient, requiring no labeled data and achieving state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB). When combined with supervised contrastive learning, LLM2Vec achieves new state-of-the-art results among models trained only on publicly available data. The approach demonstrates that decoder-only LLMs can be efficiently transformed into universal text encoders without extensive adaptation or synthetic data. The results show that LLM2Vec outperforms encoder-only models on word-level tasks and achieves strong performance on sequence-level tasks. The method is also efficient in terms of training time and computational resources, making it suitable for low-resource and compute-constrained scenarios. The work provides a simple and effective way to leverage the power of large language models for text embedding tasks.LLM2Vec is a simple unsupervised method that transforms decoder-only large language models (LLMs) into strong text encoders. The approach involves three steps: enabling bidirectional attention, masked next token prediction, and unsupervised contrastive learning. By applying these steps, LLM2Vec effectively enhances the contextualized representations of decoder-only LLMs, making them suitable for text embedding tasks. The method is data- and parameter-efficient, requiring no labeled data and achieving state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB). When combined with supervised contrastive learning, LLM2Vec achieves new state-of-the-art results among models trained only on publicly available data. The approach demonstrates that decoder-only LLMs can be efficiently transformed into universal text encoders without extensive adaptation or synthetic data. The results show that LLM2Vec outperforms encoder-only models on word-level tasks and achieves strong performance on sequence-level tasks. The method is also efficient in terms of training time and computational resources, making it suitable for low-resource and compute-constrained scenarios. The work provides a simple and effective way to leverage the power of large language models for text embedding tasks.
Reach us at info@study.space
[slides] LLM2Vec%3A Large Language Models Are Secretly Powerful Text Encoders | StudySpace