Understanding Eagle and Finch%3A RWKV with Matrix-Valued States and Dynamic Recurrence

The paper introduces Eagle (RWKV-5) and Finch (RWKV-6), two new sequence models that improve upon the RWKV architecture (RWKV-4). The key advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism, which enhance expressivity while maintaining the efficiency of RNNs. The authors also introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokenizer based on greedy matching. Four Eagle models (0.46 to 7.5 billion parameters) and two Finch models (1.6 and 3.1 billion parameters) are trained and found to perform competitively across various benchmarks. The models are released under the Apache 2.0 license, along with training and inference code. The paper discusses the architectural improvements, the new tokenizer, the dataset, and extensive experimental results demonstrating the models' performance in language modeling, associative recall, long context experiments, and multimodal tasks. The findings show that Eagle and Finch models outperform existing models in many benchmarks, challenging the dominance of Transformer architectures while retaining RNN advantages.The paper introduces Eagle (RWKV-5) and Finch (RWKV-6), two new sequence models that improve upon the RWKV architecture (RWKV-4). The key advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism, which enhance expressivity while maintaining the efficiency of RNNs. The authors also introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokenizer based on greedy matching. Four Eagle models (0.46 to 7.5 billion parameters) and two Finch models (1.6 and 3.1 billion parameters) are trained and found to perform competitively across various benchmarks. The models are released under the Apache 2.0 license, along with training and inference code. The paper discusses the architectural improvements, the new tokenizer, the dataset, and extensive experimental results demonstrating the models' performance in language modeling, associative recall, long context experiments, and multimodal tasks. The findings show that Eagle and Finch models outperform existing models in many benchmarks, challenging the dominance of Transformer architectures while retaining RNN advantages.

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence