Understanding Contextual Position Encoding%3A Learning to Count What's Important

The paper introduces a novel position encoding method called Contextual Position Encoding (CoPE), which allows positions to be conditioned on context, enabling more general position addressing. Traditional position encoding methods, such as Relative Position Encoding (RPE), rely on token positions and are order-invariant, making them insufficient for tasks requiring abstract position addressing, such as attending to the $i$-th word or sentence. CoPE addresses this by computing gate values conditioned on the context, which are then used to assign positions to tokens through a cumulative sum. This approach allows positions to be fractional and can represent various levels of position abstraction, from token positions to sentence positions. The paper demonstrates that CoPE outperforms existing position encoding methods in several tasks, including the Flip-Flop task, selective copy task, counting task, and language modeling on the Wikitext-103 dataset. CoPE also shows better generalization to out-of-distribution (OOD) sequences and longer context lengths compared to relative PE. The method is particularly effective in tasks where traditional position encodings struggle, such as counting specific elements in a sequence or attending to abstract elements like paragraphs or sections. The authors conclude that CoPE provides a more flexible and context-aware approach to position encoding, which can improve performance in various natural language processing tasks. They also discuss potential future directions, including applying CoPE to larger models and domains like video and speech.The paper introduces a novel position encoding method called Contextual Position Encoding (CoPE), which allows positions to be conditioned on context, enabling more general position addressing. Traditional position encoding methods, such as Relative Position Encoding (RPE), rely on token positions and are order-invariant, making them insufficient for tasks requiring abstract position addressing, such as attending to the $i$-th word or sentence. CoPE addresses this by computing gate values conditioned on the context, which are then used to assign positions to tokens through a cumulative sum. This approach allows positions to be fractional and can represent various levels of position abstraction, from token positions to sentence positions. The paper demonstrates that CoPE outperforms existing position encoding methods in several tasks, including the Flip-Flop task, selective copy task, counting task, and language modeling on the Wikitext-103 dataset. CoPE also shows better generalization to out-of-distribution (OOD) sequences and longer context lengths compared to relative PE. The method is particularly effective in tasks where traditional position encodings struggle, such as counting specific elements in a sequence or attending to abstract elements like paragraphs or sections. The authors conclude that CoPE provides a more flexible and context-aware approach to position encoding, which can improve performance in various natural language processing tasks. They also discuss potential future directions, including applying CoPE to larger models and domains like video and speech.

Contextual Position Encoding: Learning to Count What’s Important

30 May 2024 | Olga Golovneva Tianlu Wang Jason Weston Sainbayar Sukhbaatar