Ivan Moshkov
YOU?
Author Swipe
View article: NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model Open
We introduce Nemotron-Nano-9B-v2, a hybrid Mamba-Transformer language model designed to increase throughput for reasoning workloads while achieving state-of-the-art accuracy compared to similarly-sized models. Nemotron-Nano-9B-v2 builds on…
View article: GenSelect: A Generative Approach to Best-of-N
GenSelect: A Generative Approach to Best-of-N Open
Generative reward models with parallel sampling have enabled effective test-time scaling for reasoning tasks. Current approaches employ pointwise scoring of individual solutions or pairwise comparisons. However, pointwise methods underutil…
View article: The Challenge of Teaching Reasoning to LLMs Without RL or Distillation
The Challenge of Teaching Reasoning to LLMs Without RL or Distillation Open
Reasoning-capable language models achieve state-of-the-art performance in diverse complex tasks by generating long, explicit Chain-of-Thought (CoT) traces. While recent works show that base models can acquire such reasoning traces via rein…
View article: Llama-Nemotron: Efficient Reasoning Models
Llama-Nemotron: Efficient Reasoning Models Open
We introduce the Llama-Nemotron series of models, an open family of heterogeneous reasoning models that deliver exceptional reasoning capabilities, inference efficiency, and an open license for enterprise use. The family comes in three siz…
View article: AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset
AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset Open
This paper presents our winning submission to the AI Mathematical Olympiad - Progress Prize 2 (AIMO-2) competition. Our recipe for building state-of-the-art mathematical reasoning models relies on three key pillars. First, we create a larg…
View article: Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models Open
As inference-time scaling becomes critical for enhanced reasoning capabilities, it is increasingly becoming important to build models that are efficient to infer. We introduce Nemotron-H, a family of 8B and 56B/47B hybrid Mamba-Transformer…
View article: OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data Open
Mathematical reasoning continues to be a critical challenge in large language model (LLM) development with significant interest. However, most of the cutting-edge progress in mathematical reasoning with LLMs has become \emph{closed-source}…
View article: Nemotron-4 340B Technical Report
Nemotron-4 340B Technical Report Open
We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that al…
View article: OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Open
Recent work has shown the immense potential of synthetically generated datasets for training large language models (LLMs), especially for acquiring targeted skills. Current large-scale math instruction tuning datasets such as MetaMathQA (Y…