10 Apr 2024 | Bo Peng, Daniel Goldstein, Quentin Anthony, Alon Albalak, Eric Alcaide, Stella Biderman, Eugene Cheah, Xingjian Du, Teddy Ferdinand, Hao wen Hou, Przemysław Kazienko, Kranthi Kiran GV, Jan Kocoń, Bartłomiej Kopytyra, Satyapriya Krishna, Ronald McCelland Jr., Niklas Muennighoff, Fares Obeid, Atsushi Saito, Guangyu Song, Haoqin Tu, Stanisław Woźniak, Ruichong Zhang, Bingchen Zhao, Qihang Zhao, Peng Zhou, Jian Zhu
The paper introduces Eagle (RWKV-5) and Finch (RWKV-6), sequence models that improve upon the RWKV-4 architecture. These models incorporate multi-headed matrix-valued states and a dynamic recurrence mechanism, enhancing expressivity while maintaining the inference efficiency of RNNs. A new multilingual corpus with 1.12 trillion tokens and a fast tokenizer are introduced to improve multilingual performance. Four Eagle models (0.46B to 7.5B parameters) and two Finch models (1.6B and 3.1B parameters) are trained and released under the Apache 2.0 license. The models achieve competitive performance across various benchmarks, including language modeling, associative recall, long context tasks, and multimodal applications. The Eagle and Finch architectures are designed to be efficient and scalable, with significant improvements in performance compared to traditional transformer models. The paper also presents a new tokenizer and dataset, RWKV World v2, which is designed to enhance performance on multilingual and code data. The models are evaluated on a wide range of tasks, demonstrating their effectiveness in various sequence modeling domains. The paper highlights the importance of open-source training pipelines and the potential of these models for future language modeling innovations.The paper introduces Eagle (RWKV-5) and Finch (RWKV-6), sequence models that improve upon the RWKV-4 architecture. These models incorporate multi-headed matrix-valued states and a dynamic recurrence mechanism, enhancing expressivity while maintaining the inference efficiency of RNNs. A new multilingual corpus with 1.12 trillion tokens and a fast tokenizer are introduced to improve multilingual performance. Four Eagle models (0.46B to 7.5B parameters) and two Finch models (1.6B and 3.1B parameters) are trained and released under the Apache 2.0 license. The models achieve competitive performance across various benchmarks, including language modeling, associative recall, long context tasks, and multimodal applications. The Eagle and Finch architectures are designed to be efficient and scalable, with significant improvements in performance compared to traditional transformer models. The paper also presents a new tokenizer and dataset, RWKV World v2, which is designed to enhance performance on multilingual and code data. The models are evaluated on a wide range of tasks, demonstrating their effectiveness in various sequence modeling domains. The paper highlights the importance of open-source training pipelines and the potential of these models for future language modeling innovations.