Aya 23: Open Weight Releases to Further Multilingual Progress

Aya 23: Open Weight Releases to Further Multilingual Progress

June 3, 2024 | Viraat Aryabumi, John Dang, Dwarak Talupuru, Saurabh Dash, David Cairuz, Hangyu Lin, Bharat Venkitesh, Madeline Smith, Jon Ander Campos, Yi Chern Tan, Kelly Marchisio, Max Bartolo, Sebastian Ruder, Acyr Locatelli, Julia Kreutzer, Nick Frosst, Aidan Gomez, Phil Blunsom, Marzieh Fadaee, Ahmet Üstün, Sara Hooker
This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model, focusing on pairing a highly performant pre-trained model with the recently released Aya collection. The result is a powerful multilingual large language model serving 23 languages, expanding state-of-the-art language modeling capabilities to approximately half of the world's population. Aya 23 outperforms previous models like Aya 101 and widely used models like Gemma, Mistral, and Mixtral on a range of discriminative and generative tasks. The models are released with open weights for both the 8B and 35B versions. Aya 23 is an experiment in depth vs breadth, exploring the impact of allocating more capacity to fewer languages included during pre-training. The models are evaluated on a range of tasks, including discriminative tasks, multilingual MMLU, and MGSM. Aya 23 achieves significant improvements in performance across these tasks compared to previous models. The models are also evaluated for safety, toxicity, and bias, showing reduced harmful responses and lower toxicity. The results demonstrate the high performance of Aya 23 on a broad range of multilingual benchmarks and human evaluation. The release of these model weights is intended to contribute to further research in multilingual language models.This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model, focusing on pairing a highly performant pre-trained model with the recently released Aya collection. The result is a powerful multilingual large language model serving 23 languages, expanding state-of-the-art language modeling capabilities to approximately half of the world's population. Aya 23 outperforms previous models like Aya 101 and widely used models like Gemma, Mistral, and Mixtral on a range of discriminative and generative tasks. The models are released with open weights for both the 8B and 35B versions. Aya 23 is an experiment in depth vs breadth, exploring the impact of allocating more capacity to fewer languages included during pre-training. The models are evaluated on a range of tasks, including discriminative tasks, multilingual MMLU, and MGSM. Aya 23 achieves significant improvements in performance across these tasks compared to previous models. The models are also evaluated for safety, toxicity, and bias, showing reduced harmful responses and lower toxicity. The results demonstrate the high performance of Aya 23 on a broad range of multilingual benchmarks and human evaluation. The release of these model weights is intended to contribute to further research in multilingual language models.
Reach us at info@study.space