HGRN2: Gated Linear RNNs with State Expansion

HGRN2: Gated Linear RNNs with State Expansion

2024 | Zhen Qin, Songlin Yang, Weixuan Sun, Xuyang Shen, Dong Li, Weigao Sun, Yiran Zhong
HGRN2 is an enhanced version of the Hierarchically Gated Linear RNN (HGRN) that introduces a state expansion mechanism based on outer products to increase the recurrent state size without adding extra parameters. This improvement allows for more efficient training and better performance in language modeling tasks. HGRN2 retains the parameter and training efficiency of HGRN while significantly expanding the state size, which enhances memory capacity and in-context recall ability. The state expansion is achieved through structured matrices, which enable parameter-efficient scaling of the state size. Additionally, HGRN2 leverages linear attention techniques for hardware-efficient training, making it suitable for large-scale experiments. Experiments show that HGRN2 consistently outperforms HGRN in various tasks and is competitive with other recurrent models. The method is evaluated on multiple datasets, including Wikitext-103, Slimpajama, and the Pile, demonstrating its effectiveness in language modeling and long-context tasks. HGRN2 also performs well in image classification tasks, showing advantages over previous models. The paper compares HGRN2 with other models like GLA and Mamba, highlighting its unique approach and performance. Overall, HGRN2 provides a more efficient and effective solution for linear recurrent models, particularly in language modeling and long-context tasks.HGRN2 is an enhanced version of the Hierarchically Gated Linear RNN (HGRN) that introduces a state expansion mechanism based on outer products to increase the recurrent state size without adding extra parameters. This improvement allows for more efficient training and better performance in language modeling tasks. HGRN2 retains the parameter and training efficiency of HGRN while significantly expanding the state size, which enhances memory capacity and in-context recall ability. The state expansion is achieved through structured matrices, which enable parameter-efficient scaling of the state size. Additionally, HGRN2 leverages linear attention techniques for hardware-efficient training, making it suitable for large-scale experiments. Experiments show that HGRN2 consistently outperforms HGRN in various tasks and is competitive with other recurrent models. The method is evaluated on multiple datasets, including Wikitext-103, Slimpajama, and the Pile, demonstrating its effectiveness in language modeling and long-context tasks. HGRN2 also performs well in image classification tasks, showing advantages over previous models. The paper compares HGRN2 with other models like GLA and Mamba, highlighting its unique approach and performance. Overall, HGRN2 provides a more efficient and effective solution for linear recurrent models, particularly in language modeling and long-context tasks.
Reach us at info@study.space