The Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward, is released under the NVIDIA Open Model License Agreement, allowing for distribution, modification, and use. These models perform competitively on various evaluation benchmarks and are designed to fit on a single DGX H100 with 8 GPUs in FP8 precision. The models are particularly useful for generating synthetic data to train smaller language models, with over 98% of the data used in the alignment process being synthetically generated.
The Nemotron-4-340B-Base model was trained with 9 trillion tokens from a high-quality dataset, and the alignment process involves Supervised Fine-Tuning (SFT) and Preference Fine-Tuning using methods like Reinforcement Learning with Human Feedback (RLHF) and Direct Preference Optimization (DPO). The reward model, which is crucial for RLHF and quality filtering, is also released to support ongoing LLM development.
The synthetic data generation pipeline, which includes prompt generation, response and dialogue generation, quality filtering, and preference ranking, is shared to facilitate the creation of high-quality data for various domains. This pipeline is designed to support both supervised and preference fine-tuning.
The Nemotron-4-340B-Base model outperforms or competes with other open access base models on tasks like MMLU, BBH, ARC-Challenge, MMLU, and Hellaswag. The Nemotron-4-340B-Instruct model surpasses instruct models in instruction following and chat capabilities, while the Nemotron-4-340B-Reward model achieves top accuracy on RewardBench.
The alignment process involves multiple stages, including Code SFT, General SFT, DPO, and RPO, each contributing to the model's performance. Human evaluation and safety evaluations, including content safety and security weaknesses, are conducted to ensure the models' reliability and safety.
Overall, the release of these models and the sharing of the synthetic data generation pipeline aim to accelerate research progress and promote responsible use of large language models.The Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward, is released under the NVIDIA Open Model License Agreement, allowing for distribution, modification, and use. These models perform competitively on various evaluation benchmarks and are designed to fit on a single DGX H100 with 8 GPUs in FP8 precision. The models are particularly useful for generating synthetic data to train smaller language models, with over 98% of the data used in the alignment process being synthetically generated.
The Nemotron-4-340B-Base model was trained with 9 trillion tokens from a high-quality dataset, and the alignment process involves Supervised Fine-Tuning (SFT) and Preference Fine-Tuning using methods like Reinforcement Learning with Human Feedback (RLHF) and Direct Preference Optimization (DPO). The reward model, which is crucial for RLHF and quality filtering, is also released to support ongoing LLM development.
The synthetic data generation pipeline, which includes prompt generation, response and dialogue generation, quality filtering, and preference ranking, is shared to facilitate the creation of high-quality data for various domains. This pipeline is designed to support both supervised and preference fine-tuning.
The Nemotron-4-340B-Base model outperforms or competes with other open access base models on tasks like MMLU, BBH, ARC-Challenge, MMLU, and Hellaswag. The Nemotron-4-340B-Instruct model surpasses instruct models in instruction following and chat capabilities, while the Nemotron-4-340B-Reward model achieves top accuracy on RewardBench.
The alignment process involves multiple stages, including Code SFT, General SFT, DPO, and RPO, each contributing to the model's performance. Human evaluation and safety evaluations, including content safety and security weaknesses, are conducted to ensure the models' reliability and safety.
Overall, the release of these models and the sharing of the synthetic data generation pipeline aim to accelerate research progress and promote responsible use of large language models.