[slides and audio] HGRN2%3A Gated Linear RNNs with State Expansion

The paper introduces HGRN2, an enhanced version of the Hierarchically gated linear RNN (HGRN) that addresses the limitation of its smaller recurrent state size by introducing a state expansion mechanism inspired by linear attention. This mechanism allows for efficient training without increasing the number of parameters, significantly enhancing the model's memory capacity and performance in language modeling tasks. The authors explore various state expansion methods, including structured matrices and outer product-based approaches, and demonstrate that the outer product-based method, used in HGRN2, is particularly effective. Extensive experiments across multiple tasks, including language modeling and image classification, show that HGRN2 consistently outperforms HGRN1, achieving competitive results with other subquadratic efficient models and state-of-the-art linear recurrent models. The paper also discusses the hardware-efficient training of HGRN2, leveraging similar algorithms to those used in linear attention models. Overall, HGRN2 offers a unique perspective on gated linear RNNs, providing a more efficient and expressive parameterization for sequence modeling tasks.The paper introduces HGRN2, an enhanced version of the Hierarchically gated linear RNN (HGRN) that addresses the limitation of its smaller recurrent state size by introducing a state expansion mechanism inspired by linear attention. This mechanism allows for efficient training without increasing the number of parameters, significantly enhancing the model's memory capacity and performance in language modeling tasks. The authors explore various state expansion methods, including structured matrices and outer product-based approaches, and demonstrate that the outer product-based method, used in HGRN2, is particularly effective. Extensive experiments across multiple tasks, including language modeling and image classification, show that HGRN2 consistently outperforms HGRN1, achieving competitive results with other subquadratic efficient models and state-of-the-art linear recurrent models. The paper also discusses the hardware-efficient training of HGRN2, leveraging similar algorithms to those used in linear attention models. Overall, HGRN2 offers a unique perspective on gated linear RNNs, providing a more efficient and expressive parameterization for sequence modeling tasks.

HGRN2: Gated Linear RNNs with State Expansion

19 Aug 2024 | Zhen Qin, Songlin Yang, Weixuan Sun, Xuyang Shen, Dong Li, Weigao Sun, Yiran Zhong