Explanipedia

Nemotron-4 340B Technical Report Open

Nvidia, NULL AUTHOR_ID, Bo Adler, Niket Agarwal, Ashwath Aithal , et al. · 2024

Business

We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that al…

Nemotron-4 15B Technical Report Open

Jupinder Parmar, Shrimai Prabhumoye, Joseph Jennings, Mostofa Patwary, Sandeep Subramanian , et al. · 2024

Computer science

We introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. Nemotron-4 15B demonstrates strong performance when assessed on English, multilingual, and coding tasks: it outperform…

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model Open

Shaden Smith, Mostofa Patwary, Brandon Norick, Patrick LeGresley, Samyam Rajbhandari , et al. · 2022

Computer science Physics

Pretrained general-purpose language models can achieve state-of-the-art accuracies in various natural language processing domains by adapting to downstream tasks via zero-shot, few-shot and fine-tuning techniques. Because of their success,…

Efficient large-scale language model training on GPU clusters using megatron-LM Open

Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary , et al. · 2021

Computer science Mathematics

Large language models have led to state-of-the-art accuracies across a range of tasks. However, training these models efficiently is challenging for two reasons: a) GPU memory capacity is limited, making it impossible to fit large models o…

Efficient Large-Scale Language Model Training on GPU Clusters Open

Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary , et al. · 2021

Computer science Mathematics

Large language models have led to state-of-the-art accuracies across a range of tasks. However, training these large models efficiently is challenging for two reasons: a) GPU memory capacity is limited, making it impossible to fit large mo…

Neural ODEs for Image Segmentation with Level Sets Open

Rafael Valle, Fitsum A. Reda, Mohammad Shoeybi, Patrick LeGresley, Andrew Tao , et al. · 2019

Computer science Mathematics

We propose a novel approach for image segmentation that combines Neural Ordinary Differential Equations (NODEs) and the Level Set method. Our approach parametrizes the evolution of an initial contour with a NODE that implicitly learns from…

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism Open

Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper , et al. · 2019

Computer science Geography

Recent work in language modeling demonstrates that training large transformer models advances the state of the art in Natural Language Processing applications. However, very large models can be quite difficult to train due to memory constr…

Patrick LeGresley YOU? Author Swipe