Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

10 Apr 2024 | Bo Peng, Daniel Goldstein, Quentin Anthony, Alon Albak, Eric Alcaide, Stella Biderman, Eugene Cheah, Xingjian Du, Teddy Ferdinan, Haowen Hou, Przemyslaw Kazienko, Kranthi Kiran GV, Jan Kocof, Bartlomiej Koptyra, Satyaprity Krishna, Ronald McClelland Jr., Niklas Muennighoff, Fares Obeid, Atsushi Saito, Guangyu Song, Haoqin Tu, Stanislaw Woźniak, Ruichong Zhang, Bingchen Zhao, Qihang Zhao, Peng Zhou, Jian Zhu, Rui-Jie Zhu
The paper introduces Eagle (RWKV-5) and Finch (RWKV-6), two new sequence models that improve upon the RWKV architecture (RWKV-4). The key advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism, which enhance expressivity while maintaining the efficiency of RNNs. The authors also introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokenizer based on greedy matching. Four Eagle models (0.46 to 7.5 billion parameters) and two Finch models (1.6 and 3.1 billion parameters) are trained and found to perform competitively across various benchmarks. The models are released under the Apache 2.0 license, along with training and inference code. The paper discusses the architectural improvements, the new tokenizer, the dataset, and extensive experimental results demonstrating the models' performance in language modeling, associative recall, long context experiments, and multimodal tasks. The findings show that Eagle and Finch models outperform existing models in many benchmarks, challenging the dominance of Transformer architectures while retaining RNN advantages.The paper introduces Eagle (RWKV-5) and Finch (RWKV-6), two new sequence models that improve upon the RWKV architecture (RWKV-4). The key advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism, which enhance expressivity while maintaining the efficiency of RNNs. The authors also introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokenizer based on greedy matching. Four Eagle models (0.46 to 7.5 billion parameters) and two Finch models (1.6 and 3.1 billion parameters) are trained and found to perform competitively across various benchmarks. The models are released under the Apache 2.0 license, along with training and inference code. The paper discusses the architectural improvements, the new tokenizer, the dataset, and extensive experimental results demonstrating the models' performance in language modeling, associative recall, long context experiments, and multimodal tasks. The findings show that Eagle and Finch models outperform existing models in many benchmarks, challenging the dominance of Transformer architectures while retaining RNN advantages.
Reach us at info@study.space
Understanding Eagle and Finch%3A RWKV with Matrix-Valued States and Dynamic Recurrence