Yi: Open Foundation Models by 01.AI

Yi: Open Foundation Models by 01.AI

7 Mar 2024 | 01.AI
The Yi model family, developed by 01.AI, consists of language and multimodal models with strong multi-dimensional capabilities. Based on 6B and 34B pretrained language models, the Yi series includes chat models, 200K context models, depth-upscaled models, and vision-language models. The base models perform well on benchmarks like MMLU, while finetuned chat models show strong human preference rates on platforms like AlpacaEval and Chatbot Arena. The performance is attributed to high-quality data engineering, with 3.1 trillion tokens of English and Chinese corpora used for pretraining and a small, meticulously polished instruction dataset for finetuning. For vision-language models, a vision transformer encoder is combined with a chat language model to align visual representations with semantic space. The context length is extended to 200K through lightweight continual pretraining, and depth-upscaling improves performance. The Yi models are supported by scalable infrastructure, including cross-cloud task scheduling, automatic failure recovery, and efficient inference techniques. Evaluations show Yi-34B matches GPT-3.5 in performance and efficiency, with strong performance on benchmarks like MMLU and LMSys ELO Rating. The Yi series provides cost-effective models, enables AI-native applications, and supports locally runnable chatbots for data privacy. The models also highlight the importance of data and model scaling for stronger frontier models. The Yi models are built on a standard Transformer architecture with modifications like GQA, SwiGLU activation, and RoPE ABF. The pretraining data is cleaned through a sophisticated pipeline, including heuristic filters, learned filters, and cluster-based filters. The finetuning dataset is manually curated and polished, with a focus on quality over quantity. The models are evaluated on various benchmarks, showing strong performance in commonsense reasoning, reading comprehension, math, coding, and human preference. The Yi models also demonstrate strong in-context learning capabilities, with the ability to infer complex functions. The models are extended to support long context, vision-language tasks, and depth-upscaling, with the 200K context model showing strong performance on the Needle-in-a-Haystack test. The Yi-VL models integrate vision understanding capabilities, using a Vision Transformer and projection module to align image and text features. The models are trained in three stages, with increasing image resolution and diverse datasets. The Yi models also show strong performance in depth-upscaling, with continual pretraining improving performance. The Yi series is a significant advancement in large language models, demonstrating strong performance and efficiency.The Yi model family, developed by 01.AI, consists of language and multimodal models with strong multi-dimensional capabilities. Based on 6B and 34B pretrained language models, the Yi series includes chat models, 200K context models, depth-upscaled models, and vision-language models. The base models perform well on benchmarks like MMLU, while finetuned chat models show strong human preference rates on platforms like AlpacaEval and Chatbot Arena. The performance is attributed to high-quality data engineering, with 3.1 trillion tokens of English and Chinese corpora used for pretraining and a small, meticulously polished instruction dataset for finetuning. For vision-language models, a vision transformer encoder is combined with a chat language model to align visual representations with semantic space. The context length is extended to 200K through lightweight continual pretraining, and depth-upscaling improves performance. The Yi models are supported by scalable infrastructure, including cross-cloud task scheduling, automatic failure recovery, and efficient inference techniques. Evaluations show Yi-34B matches GPT-3.5 in performance and efficiency, with strong performance on benchmarks like MMLU and LMSys ELO Rating. The Yi series provides cost-effective models, enables AI-native applications, and supports locally runnable chatbots for data privacy. The models also highlight the importance of data and model scaling for stronger frontier models. The Yi models are built on a standard Transformer architecture with modifications like GQA, SwiGLU activation, and RoPE ABF. The pretraining data is cleaned through a sophisticated pipeline, including heuristic filters, learned filters, and cluster-based filters. The finetuning dataset is manually curated and polished, with a focus on quality over quantity. The models are evaluated on various benchmarks, showing strong performance in commonsense reasoning, reading comprehension, math, coding, and human preference. The Yi models also demonstrate strong in-context learning capabilities, with the ability to infer complex functions. The models are extended to support long context, vision-language tasks, and depth-upscaling, with the 200K context model showing strong performance on the Needle-in-a-Haystack test. The Yi-VL models integrate vision understanding capabilities, using a Vision Transformer and projection module to align image and text features. The models are trained in three stages, with increasing image resolution and diverse datasets. The Yi models also show strong performance in depth-upscaling, with continual pretraining improving performance. The Yi series is a significant advancement in large language models, demonstrating strong performance and efficiency.
Reach us at info@study.space