Aya 23: Open Weight Releases to Further Multilingual Progress

Aya 23: Open Weight Releases to Further Multilingual Progress

31 May 2024 | Viraat Aryabumi, John Dang, Dwarak Talupuru, Saurabh Dash, David Cairuz, Hangyu Lin, Bharat Venkitesh, Madeline Smith, Jon Ander Campos, Yi Chern Tan, Kelly Marchisio, Max Bartolo, Sebastian Ruder, Acyr Locatelli, Julia Kreutzer, Nick Frosst, Aidan Gomez, Phil Blunsom, Marzieh Fadaee, Ahmet Üstün, Sara Hooker
This technical report introduces Aya 23, a family of multilingual language models that build on the recent release of the Aya model. Aya 23 focuses on pairing a highly performant pre-trained model with the recently released Aya collection, resulting in a powerful multilingual large language model serving 23 languages. This expansion covers approximately half of the world's population and outperforms both previous massively multilingual models like Aya 101 and widely used models such as Gemma, Mistral, and Mixtral on various discriminative and generative tasks. The report details the architecture, training process, and evaluation framework used for Aya 23, highlighting its superior performance in multilingual benchmarks and human evaluation. The open weights for both the 8B and 35B models are released to further the progress of multilingual language models.This technical report introduces Aya 23, a family of multilingual language models that build on the recent release of the Aya model. Aya 23 focuses on pairing a highly performant pre-trained model with the recently released Aya collection, resulting in a powerful multilingual large language model serving 23 languages. This expansion covers approximately half of the world's population and outperforms both previous massively multilingual models like Aya 101 and widely used models such as Gemma, Mistral, and Mixtral on various discriminative and generative tasks. The report details the architecture, training process, and evaluation framework used for Aya 23, highlighting its superior performance in multilingual benchmarks and human evaluation. The open weights for both the 8B and 35B models are released to further the progress of multilingual language models.
Reach us at info@study.space