30 May 2024 | Olga Golovneva, Tianlu Wang, Jason Weston, Sainbayar Sukhbaatar
Contextual Position Encoding (CoPE) is a novel position encoding method that allows positions to be conditioned on context, enabling more general position addressing such as attending to the i-th word, noun, or sentence. Unlike traditional position encoding methods that rely on token counts, CoPE determines which tokens to count based on their context vectors, allowing for more flexible and context-aware position measurements. This approach enables the model to attend to different position types, such as token positions, sentence positions, and other semantically meaningful units. CoPE improves performance on tasks like selective copy, counting, and Flip-Flop, where traditional position embeddings fail. It also enhances perplexity on language modeling and coding tasks. CoPE works by computing gate values based on context vectors, then using these gates to assign positions to tokens through a cumulative sum. This allows positions to be contextualized and represent counts of different units like words, verbs, or sentences. CoPE is implemented in a way that integrates context and position addressing, making it more effective in handling abstract elements like sentences. The method is tested on various tasks, including language modeling, code modeling, and counting tasks, where it outperforms traditional position encoding methods, especially in out-of-domain generalization. CoPE is also shown to generalize well to longer contexts and abstract elements like paragraphs and sections. The method is implemented with attention heads that can independently handle different position measurements, and it is efficient in terms of computation and memory. The results demonstrate that CoPE provides significant improvements in performance on various tasks, making it a promising approach for position encoding in large language models.Contextual Position Encoding (CoPE) is a novel position encoding method that allows positions to be conditioned on context, enabling more general position addressing such as attending to the i-th word, noun, or sentence. Unlike traditional position encoding methods that rely on token counts, CoPE determines which tokens to count based on their context vectors, allowing for more flexible and context-aware position measurements. This approach enables the model to attend to different position types, such as token positions, sentence positions, and other semantically meaningful units. CoPE improves performance on tasks like selective copy, counting, and Flip-Flop, where traditional position embeddings fail. It also enhances perplexity on language modeling and coding tasks. CoPE works by computing gate values based on context vectors, then using these gates to assign positions to tokens through a cumulative sum. This allows positions to be contextualized and represent counts of different units like words, verbs, or sentences. CoPE is implemented in a way that integrates context and position addressing, making it more effective in handling abstract elements like sentences. The method is tested on various tasks, including language modeling, code modeling, and counting tasks, where it outperforms traditional position encoding methods, especially in out-of-domain generalization. CoPE is also shown to generalize well to longer contexts and abstract elements like paragraphs and sections. The method is implemented with attention heads that can independently handle different position measurements, and it is efficient in terms of computation and memory. The results demonstrate that CoPE provides significant improvements in performance on various tasks, making it a promising approach for position encoding in large language models.