The paper introduces MOSS, an open-sourced conversational large language model (LLM) with 16 billion parameters. MOSS is designed to perform multi-turn interactions with humans and is pre-trained on large-scale unlabeled English, Chinese, and code data. The development of MOSS involves three stages: cross-lingual pre-training, supervised fine-tuning, and preference-aware training. Key features of MOSS include:
1. **Cross-lingual Pre-training**: The base model is pre-trained on a diverse dataset of 360B English tokens, 100B Chinese tokens, and 220B code tokens, validating the feasibility of knowledge transfer between Chinese and English.
2. **Helpful, Honest, and Harmless (HHH)**: MOSS is designed to be honest and harmless, with additional conversational data collected for supervised fine-tuning and preference-aware training.
3. **Alignment with Real-World User Intents**: An early version of MOSS was deployed to collect 100K user prompts, ensuring that the training data aligns with real-world user intents.
4. **Preference-Aware Training**: A preference model is used to tag responses with their quality, enabling MOSS to distinguish high-quality responses and generate desired responses based on specific preference tags.
5. **Augmentation with Tools**: MOSS is augmented with external tools such as search engines, calculators, equation solvers, and text-to-image generators to improve its accuracy and reliability.
The paper evaluates MOSS on real-world use cases and academic benchmarks, demonstrating its effectiveness and providing a technical roadmap for large language models. The model weights and code are publicly available on GitHub.The paper introduces MOSS, an open-sourced conversational large language model (LLM) with 16 billion parameters. MOSS is designed to perform multi-turn interactions with humans and is pre-trained on large-scale unlabeled English, Chinese, and code data. The development of MOSS involves three stages: cross-lingual pre-training, supervised fine-tuning, and preference-aware training. Key features of MOSS include:
1. **Cross-lingual Pre-training**: The base model is pre-trained on a diverse dataset of 360B English tokens, 100B Chinese tokens, and 220B code tokens, validating the feasibility of knowledge transfer between Chinese and English.
2. **Helpful, Honest, and Harmless (HHH)**: MOSS is designed to be honest and harmless, with additional conversational data collected for supervised fine-tuning and preference-aware training.
3. **Alignment with Real-World User Intents**: An early version of MOSS was deployed to collect 100K user prompts, ensuring that the training data aligns with real-world user intents.
4. **Preference-Aware Training**: A preference model is used to tag responses with their quality, enabling MOSS to distinguish high-quality responses and generate desired responses based on specific preference tags.
5. **Augmentation with Tools**: MOSS is augmented with external tools such as search engines, calculators, equation solvers, and text-to-image generators to improve its accuracy and reliability.
The paper evaluates MOSS on real-world use cases and academic benchmarks, demonstrating its effectiveness and providing a technical roadmap for large language models. The model weights and code are publicly available on GitHub.