Daria Gitman
YOU?
Author Swipe
View article: NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model Open
We introduce Nemotron-Nano-9B-v2, a hybrid Mamba-Transformer language model designed to increase throughput for reasoning workloads while achieving state-of-the-art accuracy compared to similarly-sized models. Nemotron-Nano-9B-v2 builds on…
View article: Llama-Nemotron: Efficient Reasoning Models
Llama-Nemotron: Efficient Reasoning Models Open
We introduce the Llama-Nemotron series of models, an open family of heterogeneous reasoning models that deliver exceptional reasoning capabilities, inference efficiency, and an open license for enterprise use. The family comes in three siz…
View article: NeMo-Inspector: A Visualization Tool for LLM Generation Analysis
NeMo-Inspector: A Visualization Tool for LLM Generation Analysis Open
Adapting Large Language Models (LLMs) to novel tasks and enhancing their overall capabilities often requires large, high-quality training datasets. Synthetic data, generated at scale, serves a valuable alternative when real-world data is s…
View article: Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models Open
As inference-time scaling becomes critical for enhanced reasoning capabilities, it is increasingly becoming important to build models that are efficient to infer. We introduce Nemotron-H, a family of 8B and 56B/47B hybrid Mamba-Transformer…
View article: OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Open
Recent work has shown the immense potential of synthetically generated datasets for training large language models (LLMs), especially for acquiring targeted skills. Current large-scale math instruction tuning datasets such as MetaMathQA (Y…