Nemotron-4 340B Technical Report

Nemotron-4 340B Technical Report

6 Aug 2024 | NVIDIA
NVIDIA has released the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward, under the NVIDIA Open Model License Agreement, which allows for open access and commercial use. These models are trained on 9 trillion tokens from a high-quality dataset and are designed to fit on a single DGX H100 with 8 GPUs in FP8 precision. They perform competitively on various benchmarks and are used for generating synthetic data to train smaller language models. Over 98% of the data used in the model alignment process is synthetically generated, demonstrating the effectiveness of these models in generating synthetic data. The synthetic data generation pipeline is also open-sourced to support open research and model development. The Nemotron-4-340B-Base model is trained on a blend of English natural language data, multilingual data, and source code data. It has a standard decoder-only Transformer architecture with causal attention masks, RoPE, and squared ReLU activations. The model is trained using 768 DGX H100 nodes with 8 H100 GPUs each, and uses a combination of tensor parallelism, pipeline parallelism, and data parallelism for training. The model is evaluated on various benchmarks, including MMLU, BBH, and ARC-Challenge, and achieves strong accuracy on these tasks. The reward model is used for preference ranking and quality filtering in the training of a strong instruction-following model. It is built on top of the Nemotron-4-340B-Base model and is trained on a dataset of 10k human preference data. The reward model achieves the highest accuracy on RewardBench at the time of publication and is used to train the Nemotron-4-340B-Instruct model. The alignment process involves supervised fine-tuning and preference fine-tuning, with the reward model playing a crucial role in the preference fine-tuning. The synthetic data generation pipeline is used to create data for supervised fine-tuning and preference fine-tuning, with over 98% of the data being synthetically generated. The pipeline includes synthetic prompt generation, response and dialogue generation, quality filtering, and preference ranking. The alignment process involves multiple stages, including code SFT, general SFT, DPO, and three rounds of RPO. The final model, Nemotron-4-340B-Instruct, is evaluated on various benchmarks and human evaluations, showing strong performance in instruction following and multi-turn chat. The model is also evaluated for safety, showing a very low rate of unsafe responses. The model is also evaluated for security weaknesses using Garak, showing good performance in multiple categories. The model is designed to be used responsibly and not for generating toxic or harmful content.NVIDIA has released the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward, under the NVIDIA Open Model License Agreement, which allows for open access and commercial use. These models are trained on 9 trillion tokens from a high-quality dataset and are designed to fit on a single DGX H100 with 8 GPUs in FP8 precision. They perform competitively on various benchmarks and are used for generating synthetic data to train smaller language models. Over 98% of the data used in the model alignment process is synthetically generated, demonstrating the effectiveness of these models in generating synthetic data. The synthetic data generation pipeline is also open-sourced to support open research and model development. The Nemotron-4-340B-Base model is trained on a blend of English natural language data, multilingual data, and source code data. It has a standard decoder-only Transformer architecture with causal attention masks, RoPE, and squared ReLU activations. The model is trained using 768 DGX H100 nodes with 8 H100 GPUs each, and uses a combination of tensor parallelism, pipeline parallelism, and data parallelism for training. The model is evaluated on various benchmarks, including MMLU, BBH, and ARC-Challenge, and achieves strong accuracy on these tasks. The reward model is used for preference ranking and quality filtering in the training of a strong instruction-following model. It is built on top of the Nemotron-4-340B-Base model and is trained on a dataset of 10k human preference data. The reward model achieves the highest accuracy on RewardBench at the time of publication and is used to train the Nemotron-4-340B-Instruct model. The alignment process involves supervised fine-tuning and preference fine-tuning, with the reward model playing a crucial role in the preference fine-tuning. The synthetic data generation pipeline is used to create data for supervised fine-tuning and preference fine-tuning, with over 98% of the data being synthetically generated. The pipeline includes synthetic prompt generation, response and dialogue generation, quality filtering, and preference ranking. The alignment process involves multiple stages, including code SFT, general SFT, DPO, and three rounds of RPO. The final model, Nemotron-4-340B-Instruct, is evaluated on various benchmarks and human evaluations, showing strong performance in instruction following and multi-turn chat. The model is also evaluated for safety, showing a very low rate of unsafe responses. The model is also evaluated for security weaknesses using Garak, showing good performance in multiple categories. The model is designed to be used responsibly and not for generating toxic or harmful content.
Reach us at info@study.space
Understanding Nemotron-4 340B Technical Report