Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation

Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation

2024 | Zhenyu He * 1 Guhao Feng * 2 Shengjie Luo * 1 Kai Yang 2 Liwei Wang 1 3 Jingjing Xu 4 Zhi Zhang 4 Hongxia Yang 4 Di He 1
This paper introduces Bilevel Positional Encoding (BiPE), a novel positional encoding method designed to improve length extrapolation in language models. BiPE combines two distinct encodings for each position: an intra-segment encoding and an inter-segment encoding. The intra-segment encoding identifies the location within a segment, helping the model capture semantic information, while the inter-segment encoding specifies the segment index and models relationships between segments, enhancing extrapolation capabilities. Theoretical analysis shows that this disentanglement of positional information makes learning more effective. Empirical results demonstrate that BiPE outperforms existing methods in various tasks across different text modalities, showing superior length extrapolation capabilities. The paper also includes a theoretical justification of BiPE's parameter efficiency and extensive experiments to validate its effectiveness.This paper introduces Bilevel Positional Encoding (BiPE), a novel positional encoding method designed to improve length extrapolation in language models. BiPE combines two distinct encodings for each position: an intra-segment encoding and an inter-segment encoding. The intra-segment encoding identifies the location within a segment, helping the model capture semantic information, while the inter-segment encoding specifies the segment index and models relationships between segments, enhancing extrapolation capabilities. Theoretical analysis shows that this disentanglement of positional information makes learning more effective. Empirical results demonstrate that BiPE outperforms existing methods in various tasks across different text modalities, showing superior length extrapolation capabilities. The paper also includes a theoretical justification of BiPE's parameter efficiency and extensive experiments to validate its effectiveness.
Reach us at info@study.space
[slides] Two Stones Hit One Bird%3A Bilevel Positional Encoding for Better Length Extrapolation | StudySpace