MOSS: An Open Conversational Large Language Model

MOSS: An Open Conversational Large Language Model

2024 | Tianxiang Sun, Xiaotian Zhang, Zhengfu He, Peng Li, Qinyuan Cheng, Xiangyang Liu, Hang Yan, Yunfan Shao, Qiong Tang, Shiduo Zhang, Xingjian Zhao, Ke Chen, Yining Zheng, Zhejian Zhou, Ruixiao Li, Jun Zhan, Yunhua Zhou, Linyang Li, Xiaogui Yang, Lingling Wu, Zhangyue Yin, Xuanjing Huang, Yu-Gang Jiang, Xipeng Qiu
The paper introduces MOSS, an open-sourced conversational large language model (LLM) with 16 billion parameters. MOSS is designed to perform multi-turn interactions with humans and is pre-trained on large-scale unlabeled English, Chinese, and code data. The development of MOSS involves three stages: cross-lingual pre-training, supervised fine-tuning, and preference-aware training. Key features of MOSS include: 1. **Cross-lingual Pre-training**: The base model is pre-trained on a diverse dataset of 360B English tokens, 100B Chinese tokens, and 220B code tokens, validating the feasibility of knowledge transfer between Chinese and English. 2. **Helpful, Honest, and Harmless (HHH)**: MOSS is designed to be honest and harmless, with additional conversational data collected for supervised fine-tuning and preference-aware training. 3. **Alignment with Real-World User Intents**: An early version of MOSS was deployed to collect 100K user prompts, ensuring that the training data aligns with real-world user intents. 4. **Preference-Aware Training**: A preference model is used to tag responses with their quality, enabling MOSS to distinguish high-quality responses and generate desired responses based on specific preference tags. 5. **Augmentation with Tools**: MOSS is augmented with external tools such as search engines, calculators, equation solvers, and text-to-image generators to improve its accuracy and reliability. The paper evaluates MOSS on real-world use cases and academic benchmarks, demonstrating its effectiveness and providing a technical roadmap for large language models. The model weights and code are publicly available on GitHub.The paper introduces MOSS, an open-sourced conversational large language model (LLM) with 16 billion parameters. MOSS is designed to perform multi-turn interactions with humans and is pre-trained on large-scale unlabeled English, Chinese, and code data. The development of MOSS involves three stages: cross-lingual pre-training, supervised fine-tuning, and preference-aware training. Key features of MOSS include: 1. **Cross-lingual Pre-training**: The base model is pre-trained on a diverse dataset of 360B English tokens, 100B Chinese tokens, and 220B code tokens, validating the feasibility of knowledge transfer between Chinese and English. 2. **Helpful, Honest, and Harmless (HHH)**: MOSS is designed to be honest and harmless, with additional conversational data collected for supervised fine-tuning and preference-aware training. 3. **Alignment with Real-World User Intents**: An early version of MOSS was deployed to collect 100K user prompts, ensuring that the training data aligns with real-world user intents. 4. **Preference-Aware Training**: A preference model is used to tag responses with their quality, enabling MOSS to distinguish high-quality responses and generate desired responses based on specific preference tags. 5. **Augmentation with Tools**: MOSS is augmented with external tools such as search engines, calculators, equation solvers, and text-to-image generators to improve its accuracy and reliability. The paper evaluates MOSS on real-world use cases and academic benchmarks, demonstrating its effectiveness and providing a technical roadmap for large language models. The model weights and code are publicly available on GitHub.
Reach us at info@study.space
[slides] MOSS%3A An Open Conversational Large Language Model | StudySpace