Nemotron-4 15B Technical Report

Nemotron-4 15B Technical Report

27 Feb 2024 | Jupinder Parmar, Shrimai Prabhumoye, Joseph Jennings, Mostofa Patwary, Sandeep Subramanian, Dan Su, Chen Zhu, Deepak Narayanan, Aastha Jhunjhunwala, Ayush Dattagupta, Vibhu Jawa, Jiwei Liu, Ameya Mahabaleshwarkar, Osvald Nitski, Annika Brundyn, James Maki, Miguel Martinez, Jiaxuan You, John Kamalu, Patrick LeGresley, Denys Fridman, Jared Casper, Ashwath Aithal, Oleksii Kuchaiev, Mohammad Shoeybi, Jonathan Cohen, Bryan Catanzaro
Nemotron-4 15B is a 15-billion-parameter multilingual language model trained on 8 trillion text tokens. It outperforms existing open models in four out of seven evaluation areas and performs competitively in the remaining ones. It demonstrates the best multilingual capabilities among similarly-sized models, even surpassing larger and specialized multilingual models. The model is trained on English, multilingual, and coding text and is designed to be the best general-purpose large language model that can fit on a single NVIDIA A100 or H100 GPU. Nemotron-4 15B achieves high accuracy across various evaluation areas, including English, code, and multilingual tasks. It performs better than LLaMA-2 34B and Mistral 7B on English tasks and is competitive with QWEN 14B and Gemma 7B. In programming languages, it outperforms Starcoder and Mistral 7B, especially on low-resource languages. It is currently the state-of-the-art general-purpose model in its size class on all multilingual benchmarks. The model uses a standard decoder-only Transformer architecture with causal attention masks. It is trained on a pre-training dataset consisting of 8 trillion tokens, including English, multilingual, and code data. The training used 384 DGX H100 nodes with 8 H100 GPUs each, and a combination of tensor and data parallelism. The model was trained for approximately 13 calendar days. Nemotron-4 15B was evaluated on various downstream tasks, including commonsense reasoning, math, code, and multilingual benchmarks. It achieved strong performance on these tasks, outperforming many existing models. On the MMLU benchmark, it achieved a highly competitive score. On the BBH benchmark, it achieved the best score among existing models. On math and code tasks, it performed well, outperforming models like Mistral 7B and LLaMA-2 13B/34B. In multilingual tasks, Nemotron-4 15B demonstrated strong performance across four widely-studied benchmarks. It outperformed models like XGLM and mGPT, and achieved the best performance on the XCOPA benchmark. It also performed well on generation and machine translation tasks, outperforming other models in these areas. Overall, Nemotron-4 15B is a strong, general-purpose language model with strong multilingual capabilities.Nemotron-4 15B is a 15-billion-parameter multilingual language model trained on 8 trillion text tokens. It outperforms existing open models in four out of seven evaluation areas and performs competitively in the remaining ones. It demonstrates the best multilingual capabilities among similarly-sized models, even surpassing larger and specialized multilingual models. The model is trained on English, multilingual, and coding text and is designed to be the best general-purpose large language model that can fit on a single NVIDIA A100 or H100 GPU. Nemotron-4 15B achieves high accuracy across various evaluation areas, including English, code, and multilingual tasks. It performs better than LLaMA-2 34B and Mistral 7B on English tasks and is competitive with QWEN 14B and Gemma 7B. In programming languages, it outperforms Starcoder and Mistral 7B, especially on low-resource languages. It is currently the state-of-the-art general-purpose model in its size class on all multilingual benchmarks. The model uses a standard decoder-only Transformer architecture with causal attention masks. It is trained on a pre-training dataset consisting of 8 trillion tokens, including English, multilingual, and code data. The training used 384 DGX H100 nodes with 8 H100 GPUs each, and a combination of tensor and data parallelism. The model was trained for approximately 13 calendar days. Nemotron-4 15B was evaluated on various downstream tasks, including commonsense reasoning, math, code, and multilingual benchmarks. It achieved strong performance on these tasks, outperforming many existing models. On the MMLU benchmark, it achieved a highly competitive score. On the BBH benchmark, it achieved the best score among existing models. On math and code tasks, it performed well, outperforming models like Mistral 7B and LLaMA-2 13B/34B. In multilingual tasks, Nemotron-4 15B demonstrated strong performance across four widely-studied benchmarks. It outperformed models like XGLM and mGPT, and achieved the best performance on the XCOPA benchmark. It also performed well on generation and machine translation tasks, outperforming other models in these areas. Overall, Nemotron-4 15B is a strong, general-purpose language model with strong multilingual capabilities.
Reach us at info@study.space
[slides] Nemotron-4 15B Technical Report | StudySpace