Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model

Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model

February 13, 2024 | Ahmet Üstün, Viraat Aryabumi, Zheng-Xin Yong, Wei-Yin Ko, Daniel D'souza, Gbemileke Onilude, Neel Bhandari, Shivalika Singh, Hui-Lee Ooi, Amr Kayid, Freddie Vargas, Phil Blunsom, Shayne Longpre, Niklas Muennighoff, Marzieh Fadaee, Julia Kreutzer, and Sara Hooker
Aya is a massively multilingual generative language model that follows instructions in 101 languages, of which over 50% are considered lower-resourced. It outperforms mT0 and BLOOMZ on most tasks while covering double the number of languages. The model is evaluated across 99 languages, including discriminative and generative tasks, human evaluation, and simulated win rates. The Aya model is open-sourced with instruction datasets and the model available at https://hf.co/CohereForAI/aya-101. The Aya model is built by fine-tuning a 13B parameter mT5 model using an instruction mixture of 101 languages. The model's training data includes extensive datasets, with a focus on data pruning, balancing language representation, and ensuring safety. The model's training mixture includes 203M data points, covering 101 languages, with 51 languages considered lower-resourced. The Aya model's training data includes a combination of multilingual templates, human annotations, and synthetic data generated through translation. The model is evaluated across various tasks, including discriminative and generative tasks, human evaluation, and simulated win rates. The Aya model outperforms mT0 and BLOOMZ in most tasks, with a 13.1% and 11.7% relative performance gain for discriminative and generative tasks, respectively. The Aya model is also evaluated for safety, with multilingual safety context distillation implemented to mitigate LLM safety concerns. The model is found to reduce harmful generations for adversarial prompts by 78–89% as judged by human experts. The model is also analyzed for toxicity, social bias, and gender bias across 18 languages. The Aya model is available under a fully open-source Apache 2.0 License at https://hf.co/CohereForAI/aya-101. The model is designed to expand language coverage to 101 languages, far beyond the current coverage of open-source massively multilingual models such as Okapi, mT0, BLOOMZ, and Bactrian-X. The model is trained on a diverse set of data sources, including human annotations, multilingual templates, and synthetic data generated through translation. The model is evaluated across various tasks, including discriminative and generative tasks, human evaluation, and simulated win rates. The Aya model is found to outperform mT0 and BLOOMZ in most tasks, with a 13.1% and 11.7% relative performance gain for discriminative and generative tasks, respectively. The model is also evaluated for safety, with multilingual safety context distillation implemented to mitigate LLM safety concerns. The model is found to reduce harmful generations for adversarial prompts by 78–89% as judged by humanAya is a massively multilingual generative language model that follows instructions in 101 languages, of which over 50% are considered lower-resourced. It outperforms mT0 and BLOOMZ on most tasks while covering double the number of languages. The model is evaluated across 99 languages, including discriminative and generative tasks, human evaluation, and simulated win rates. The Aya model is open-sourced with instruction datasets and the model available at https://hf.co/CohereForAI/aya-101. The Aya model is built by fine-tuning a 13B parameter mT5 model using an instruction mixture of 101 languages. The model's training data includes extensive datasets, with a focus on data pruning, balancing language representation, and ensuring safety. The model's training mixture includes 203M data points, covering 101 languages, with 51 languages considered lower-resourced. The Aya model's training data includes a combination of multilingual templates, human annotations, and synthetic data generated through translation. The model is evaluated across various tasks, including discriminative and generative tasks, human evaluation, and simulated win rates. The Aya model outperforms mT0 and BLOOMZ in most tasks, with a 13.1% and 11.7% relative performance gain for discriminative and generative tasks, respectively. The Aya model is also evaluated for safety, with multilingual safety context distillation implemented to mitigate LLM safety concerns. The model is found to reduce harmful generations for adversarial prompts by 78–89% as judged by human experts. The model is also analyzed for toxicity, social bias, and gender bias across 18 languages. The Aya model is available under a fully open-source Apache 2.0 License at https://hf.co/CohereForAI/aya-101. The model is designed to expand language coverage to 101 languages, far beyond the current coverage of open-source massively multilingual models such as Okapi, mT0, BLOOMZ, and Bactrian-X. The model is trained on a diverse set of data sources, including human annotations, multilingual templates, and synthetic data generated through translation. The model is evaluated across various tasks, including discriminative and generative tasks, human evaluation, and simulated win rates. The Aya model is found to outperform mT0 and BLOOMZ in most tasks, with a 13.1% and 11.7% relative performance gain for discriminative and generative tasks, respectively. The model is also evaluated for safety, with multilingual safety context distillation implemented to mitigate LLM safety concerns. The model is found to reduce harmful generations for adversarial prompts by 78–89% as judged by human
Reach us at info@study.space