Patrick LeGresley
YOU?
Author Swipe
View article: Nemotron-4 340B Technical Report
Nemotron-4 340B Technical Report Open
We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that al…
View article: Nemotron-4 15B Technical Report
Nemotron-4 15B Technical Report Open
We introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. Nemotron-4 15B demonstrates strong performance when assessed on English, multilingual, and coding tasks: it outperform…
View article: Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model Open
Pretrained general-purpose language models can achieve state-of-the-art accuracies in various natural language processing domains by adapting to downstream tasks via zero-shot, few-shot and fine-tuning techniques. Because of their success,…
View article: Efficient large-scale language model training on GPU clusters using megatron-LM
Efficient large-scale language model training on GPU clusters using megatron-LM Open
Large language models have led to state-of-the-art accuracies across a range of tasks. However, training these models efficiently is challenging for two reasons: a) GPU memory capacity is limited, making it impossible to fit large models o…
View article: Efficient Large-Scale Language Model Training on GPU Clusters
Efficient Large-Scale Language Model Training on GPU Clusters Open
Large language models have led to state-of-the-art accuracies across a range of tasks. However, training these large models efficiently is challenging for two reasons: a) GPU memory capacity is limited, making it impossible to fit large mo…
View article: Neural ODEs for Image Segmentation with Level Sets
Neural ODEs for Image Segmentation with Level Sets Open
We propose a novel approach for image segmentation that combines Neural Ordinary Differential Equations (NODEs) and the Level Set method. Our approach parametrizes the evolution of an initial contour with a NODE that implicitly learns from…
View article: Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism Open
Recent work in language modeling demonstrates that training large transformer models advances the state of the art in Natural Language Processing applications. However, very large models can be quite difficult to train due to memory constr…