17 Apr 2024 | Niklas Muennichhoff, Hongjin Su, Liang Wang, Nan Yang, Furu Wei, Tao Yu, Amanpreet Singh, Douwe Kiela
Generative Representational Instruction Tuning (GRIT) is a method that unifies text embedding and generation tasks into a single large language model (LLM). GRIT trains a large language model to handle both tasks by distinguishing them through instructions. The resulting GRITLM 7B model sets a new state of the art on the Massive Text Embedding Benchmark (MTEB) and outperforms all models up to its size on generative tasks. GRITLM 8x7B further outperforms all open generative language models tried while still being among the best embedding models. GRIT matches training on only generative or embedding data, thus allowing both tasks to be unified without performance loss. This unification speeds up Retrieval-Augmented Generation (RAG) by over 60% for long documents by eliminating the need for separate retrieval and generation models. GRITLM is available at https://github.com/ContextualAI/gritlm.
GRIT unifies representational instruction tuning and generative instruction tuning into a single model. It finetunes a pretrained large language model with embedding and generative instruction data in a consistent format. For embedding data, it uses a contrastive objective with in-batch negatives. For generative data, it uses a language modeling objective. The model learns to differentiate between the two streams via instructions and separate loss functions. GRITLM 7B sets a new state of the art on MTEB among open models and outperforms much larger models on generative tasks. GRITLM 8x7B is the best open generative language model on our task average while using only 13B parameters at inference.
GRIT leads to three advantages: performance, efficiency, and simplicity. Performance-wise, the unified model matches the performance of embedding-only and generative-only variants, even outperforming them on some tasks. Efficiency-wise, GRITLM allows for faster RAG by halving the number of forward passes. Simplicity-wise, a single model handles both use cases, significantly simplifying infrastructure needs.
GRIT is trained with two objective functions, requiring more compute. However, finetuning is cheap compared to pretraining, so the benefits outweigh the problem. GRITLM is available at https://github.com/ContextualAI/gritlm.
GRITLM is the first model to perform best-in-class at both text representation and generation tasks simultaneously. It uses bidirectional attention for embedding tasks and causal attention for generative tasks. The model uses a language modeling head to predict the next tokens. The format supports conversations with multiple turns.
GRITLM is the only model that can handle both embedding and generation at best-in-class performance. It outperforms other models on both tasks. GRITLM 7B is significantly more costly to run than many other embedding models but produces representations of 4096 dimensions, requiring 4× more storage than theGenerative Representational Instruction Tuning (GRIT) is a method that unifies text embedding and generation tasks into a single large language model (LLM). GRIT trains a large language model to handle both tasks by distinguishing them through instructions. The resulting GRITLM 7B model sets a new state of the art on the Massive Text Embedding Benchmark (MTEB) and outperforms all models up to its size on generative tasks. GRITLM 8x7B further outperforms all open generative language models tried while still being among the best embedding models. GRIT matches training on only generative or embedding data, thus allowing both tasks to be unified without performance loss. This unification speeds up Retrieval-Augmented Generation (RAG) by over 60% for long documents by eliminating the need for separate retrieval and generation models. GRITLM is available at https://github.com/ContextualAI/gritlm.
GRIT unifies representational instruction tuning and generative instruction tuning into a single model. It finetunes a pretrained large language model with embedding and generative instruction data in a consistent format. For embedding data, it uses a contrastive objective with in-batch negatives. For generative data, it uses a language modeling objective. The model learns to differentiate between the two streams via instructions and separate loss functions. GRITLM 7B sets a new state of the art on MTEB among open models and outperforms much larger models on generative tasks. GRITLM 8x7B is the best open generative language model on our task average while using only 13B parameters at inference.
GRIT leads to three advantages: performance, efficiency, and simplicity. Performance-wise, the unified model matches the performance of embedding-only and generative-only variants, even outperforming them on some tasks. Efficiency-wise, GRITLM allows for faster RAG by halving the number of forward passes. Simplicity-wise, a single model handles both use cases, significantly simplifying infrastructure needs.
GRIT is trained with two objective functions, requiring more compute. However, finetuning is cheap compared to pretraining, so the benefits outweigh the problem. GRITLM is available at https://github.com/ContextualAI/gritlm.
GRITLM is the first model to perform best-in-class at both text representation and generation tasks simultaneously. It uses bidirectional attention for embedding tasks and causal attention for generative tasks. The model uses a language modeling head to predict the next tokens. The format supports conversations with multiple turns.
GRITLM is the only model that can handle both embedding and generation at best-in-class performance. It outperforms other models on both tasks. GRITLM 7B is significantly more costly to run than many other embedding models but produces representations of 4096 dimensions, requiring 4× more storage than the