Understanding RecurrentGemma%3A Moving Past Transformers for Efficient Open Language Models

RecurrentGemma is an open language model introduced by Google DeepMind, based on the Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language tasks. Unlike traditional transformers, RecurrentGemma has a fixed-sized state, which reduces memory usage and enables efficient inference on long sequences. The model is pre-trained with 2 billion non-embedding parameters and an instruction-tuned variant. Both models achieve comparable performance to Gemma-2B despite being trained on fewer tokens. Key features of RecurrentGemma include: - **Model Architecture**: RecurrentGemma uses a modified Griffin architecture, multiplying input embeddings by the square root of the model width. This modification helps in reducing memory usage. - **Training Details**: Pre-training is done on 2 trillion tokens, using the same data as Gemma-2B, primarily English from web documents, mathematics, and code. - **Instruction Tuning and RLHF**: The model is fine-tuned for instruction-following and dialogue using a novel RLHF algorithm. - **Evaluation**: RecurrentGemma-2B achieves comparable performance to Gemma-2B in automated benchmarks and human evaluation studies, demonstrating competitive results against larger models like Mistral 7B v0.2 Instruct. - **Inference Speed**: RecurrentGemma offers significantly higher throughput during inference, especially on long sequences, due to its bounded state size. The team behind RecurrentGemma includes contributions from the Griffin, RLHF, and Gemma teams, along with numerous other contributors and acknowledgments from various Google teams. The model is designed to unlock novel applications of highly performant small language models in resource-constrained environments.RecurrentGemma is an open language model introduced by Google DeepMind, based on the Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language tasks. Unlike traditional transformers, RecurrentGemma has a fixed-sized state, which reduces memory usage and enables efficient inference on long sequences. The model is pre-trained with 2 billion non-embedding parameters and an instruction-tuned variant. Both models achieve comparable performance to Gemma-2B despite being trained on fewer tokens. Key features of RecurrentGemma include: - **Model Architecture**: RecurrentGemma uses a modified Griffin architecture, multiplying input embeddings by the square root of the model width. This modification helps in reducing memory usage. - **Training Details**: Pre-training is done on 2 trillion tokens, using the same data as Gemma-2B, primarily English from web documents, mathematics, and code. - **Instruction Tuning and RLHF**: The model is fine-tuned for instruction-following and dialogue using a novel RLHF algorithm. - **Evaluation**: RecurrentGemma-2B achieves comparable performance to Gemma-2B in automated benchmarks and human evaluation studies, demonstrating competitive results against larger models like Mistral 7B v0.2 Instruct. - **Inference Speed**: RecurrentGemma offers significantly higher throughput during inference, especially on long sequences, due to its bounded state size. The team behind RecurrentGemma includes contributions from the Griffin, RLHF, and Gemma teams, along with numerous other contributors and acknowledgments from various Google teams. The model is designed to unlock novel applications of highly performant small language models in resource-constrained environments.

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

2024-4-12 | Griffin, RLHF and Gemma Teams