Yi: Open Foundation Models by 01.AI

Yi: Open Foundation Models by 01.AI

7 Mar 2024 | 01.AI
The Yi model family, introduced by 01.AI, consists of a series of language and multimodal models that demonstrate strong multi-dimensional capabilities. The models are based on 6B and 34B pretrained language models, extended to chat models, 200K long context models, depth-upscaled models, and vision-language models. The base models achieve strong performance on benchmarks like MMLU, and the finetuned chat models deliver high human preference rates on platforms like AlpacaEval and Chatbot Arena. The performance is attributed to the high quality of the data, which is constructed using a cascaded data deduplication and quality filtering pipeline. The Yi models are trained on 3.1 trillion tokens of English and Chinese corpora, and the finetuning dataset is curated from multi-turn instruction-response pairs, ensuring each instance is verified by machine learning engineers. The models are designed to balance model scale, data scale, and data quality, with a focus on quality over quantity. The infrastructure supports full-stack development, from pretraining to finetuning to serving, and includes automated resource management, efficient training, and cost-effective inference techniques. The Yi models have shown promising performance, matching GPT-3.5 in both performance and efficiency, and have been evaluated on various benchmarks, demonstrating strong capabilities in areas such as commonsense reasoning, reading comprehension, and human preference. The models also exhibit strong in-context learning capabilities and have been extended to handle long contexts, vision-language tasks, and depth-upscaling, further enhancing their performance and versatility.The Yi model family, introduced by 01.AI, consists of a series of language and multimodal models that demonstrate strong multi-dimensional capabilities. The models are based on 6B and 34B pretrained language models, extended to chat models, 200K long context models, depth-upscaled models, and vision-language models. The base models achieve strong performance on benchmarks like MMLU, and the finetuned chat models deliver high human preference rates on platforms like AlpacaEval and Chatbot Arena. The performance is attributed to the high quality of the data, which is constructed using a cascaded data deduplication and quality filtering pipeline. The Yi models are trained on 3.1 trillion tokens of English and Chinese corpora, and the finetuning dataset is curated from multi-turn instruction-response pairs, ensuring each instance is verified by machine learning engineers. The models are designed to balance model scale, data scale, and data quality, with a focus on quality over quantity. The infrastructure supports full-stack development, from pretraining to finetuning to serving, and includes automated resource management, efficient training, and cost-effective inference techniques. The Yi models have shown promising performance, matching GPT-3.5 in both performance and efficiency, and have been evaluated on various benchmarks, demonstrating strong capabilities in areas such as commonsense reasoning, reading comprehension, and human preference. The models also exhibit strong in-context learning capabilities and have been extended to handle long contexts, vision-language tasks, and depth-upscaling, further enhancing their performance and versatility.
Reach us at info@study.space