23 Feb 2024 | Jacob Mitchell Springer, Suhas Kotha, Daniel Fried, Graham Neubig, Aditi Raghunathan
Echo embeddings improve language model embeddings by allowing early tokens to capture information from later tokens through repetition. This approach involves passing the input twice to the language model and extracting embeddings from the second occurrence. By doing so, the model can attend to the first occurrence of the input, enabling the second occurrence's embeddings to encode information from later tokens. This method addresses a key limitation of autoregressive models, where contextualized token embeddings cannot contain information from tokens that appear later in the input.
Echo embeddings outperform classical embeddings on the MTEB leaderboard, achieving over 9% improvement in zero-shot settings and around 0.7% improvement when fine-tuned. They also achieve state-of-the-art results when using the Mistral-7B model, surpassing prior open-source models that do not leverage synthetic fine-tuning data. Echo embeddings are simple to implement and compatible with various embedding extraction techniques from autoregressive language models.
The method is conceptually well-motivated and effective in capturing bidirectional information, which is crucial for accurate semantic similarity and retrieval tasks. Echo embeddings are evaluated on a variety of tasks and demonstrate consistent improvements over classical embeddings, particularly in scenarios where the distinguishing information is in the latter part of the input. They are also more robust to noise compared to last-token pooling strategies.
In experiments, echo embeddings outperform classical embeddings in both zero-shot and fine-tuned settings, showing that they can significantly improve the performance of embeddings on real data. They are also more robust to variations in prompts and are less sensitive to the exact wording of the input. Echo embeddings provide a simple yet effective solution to the limitation of autoregressive models, allowing for better performance in tasks that require capturing information from later tokens.Echo embeddings improve language model embeddings by allowing early tokens to capture information from later tokens through repetition. This approach involves passing the input twice to the language model and extracting embeddings from the second occurrence. By doing so, the model can attend to the first occurrence of the input, enabling the second occurrence's embeddings to encode information from later tokens. This method addresses a key limitation of autoregressive models, where contextualized token embeddings cannot contain information from tokens that appear later in the input.
Echo embeddings outperform classical embeddings on the MTEB leaderboard, achieving over 9% improvement in zero-shot settings and around 0.7% improvement when fine-tuned. They also achieve state-of-the-art results when using the Mistral-7B model, surpassing prior open-source models that do not leverage synthetic fine-tuning data. Echo embeddings are simple to implement and compatible with various embedding extraction techniques from autoregressive language models.
The method is conceptually well-motivated and effective in capturing bidirectional information, which is crucial for accurate semantic similarity and retrieval tasks. Echo embeddings are evaluated on a variety of tasks and demonstrate consistent improvements over classical embeddings, particularly in scenarios where the distinguishing information is in the latter part of the input. They are also more robust to noise compared to last-token pooling strategies.
In experiments, echo embeddings outperform classical embeddings in both zero-shot and fine-tuned settings, showing that they can significantly improve the performance of embeddings on real data. They are also more robust to variations in prompts and are less sensitive to the exact wording of the input. Echo embeddings provide a simple yet effective solution to the limitation of autoregressive models, allowing for better performance in tasks that require capturing information from later tokens.