Explanipedia

Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity Open

Susav Shrestha, Brad Settlemyer, Nikoli Dryden, Narasimha Reddy · 2025

Accelerating large language model (LLM) inference is critical for real-world deployments requiring high throughput and low latency. Contextual sparsity, where each token dynamically activates only a small subset of the model parameters, sh…

Lion Cub: Minimizing Communication Overhead in Distributed Lion Open

Satoki Ishikawa, Tal Ben‐Nun, Brian Van Essen, Rio Yokota, Nikoli Dryden · 2024

Computer science

Communication overhead is a key challenge in distributed deep learning, especially on slower Ethernet interconnects, and given current hardware trends, communication is likely to become a major bottleneck. While gradient compression techni…

Scaling Large-scale GNN Training to Thousands of Processors on CPU-based Supercomputers Open

Zhuang Chen, Peng Chen, Xin Liu, Rio Yokota, Nikoli Dryden , et al. · 2024

Computer science Physics

Graph Convolutional Networks (GCNs), particularly for large-scale graphs, are crucial across numerous domains. However, training distributed full-batch GCNs on large-scale graphs suffers from inefficient memory access patterns and high com…

Lion Cub: Minimizing Communication Overhead in Distributed Lion Open

Shota Ishikawa, Tal Ben‐Nun, Brian Van Essen, Rio Yokota, Nikoli Dryden · 2024

Computer science

Communication overhead is a key challenge in distributed deep learning, especially on slower Ethernet intercon nects, and given current hardware trends, communication is likely to become a major bottleneck. While gradient compression techn…

Learning to Compose SuperWeights for Neural Parameter Allocation Search Open

Piotr Teterwak, Soren Nelson, Nikoli Dryden, Dina Bashkirova, Kate Saenko , et al. · 2023

Computer science Chemistry

Neural parameter allocation search (NPAS) automates parameter sharing by obtaining weights for a network given an arbitrary, fixed parameter budget. Prior work has two major drawbacks we aim to address. First, there is a disconnect in the …

Cached Operator Reordering: A Unified View for Fast GNN Training Open

Julia Bazińska, Andrei Ivanov, Tal Ben‐Nun, Nikoli Dryden, Maciej Besta , et al. · 2023

Computer science

Graph Neural Networks (GNNs) are a powerful tool for handling structured graph data and addressing tasks such as node classification, graph classification, and clustering. However, the sparse nature of GNN computation poses new challenges …

STen: Productive and Efficient Sparsity in PyTorch Open

Андрей Иванов, Nikoli Dryden, Tal Ben‐Nun, Saleh Ashkboos, Torsten Hoefler · 2023

Computer science Mathematics Physics

As deep learning models grow, sparsity is becoming an increasingly critical component of deep neural networks, enabling improved performance and reduced storage. However, existing frameworks offer poor support for sparsity. Specialized spa…

Spatial Mixture-of-Experts Open

Nikoli Dryden, Torsten Hoefler · 2022

Computer science Engineering Chemistry

Many data have an underlying dependence on spatial location; it may be weather on the Earth, a simulation on a mesh, or a registered image. Yet this feature is rarely taken advantage of, and violates common assumptions made by many neural …

Neural Graph Databases Open

Maciej Besta, Patrick Iff, Florian Scheidl, Kazuki Osawa, Nikoli Dryden , et al. · 2022

Computer science

Graph databases (GDBs) enable processing and analysis of unstructured, complex, rich, and usually vast graph datasets. Despite the large significance of GDBs in both academia and industry, little effort has been made into integrating them …

ENS-10: A Dataset For Post-Processing Ensemble Weather Forecasts Open

Saleh Ashkboos, Langwen Huang, Nikoli Dryden, Tal Ben‐Nun, Peter Dueben , et al. · 2022

Computer science Geography Business

Post-processing ensemble prediction systems can improve the reliability of weather forecasting, especially for extreme event prediction. In recent years, different machine learning models have been developed to improve the quality of weath…

Clairvoyant prefetching for distributed machine learning I/O Open

Nikoli Dryden, Roman Böhringer, Tal Ben‐Nun, Torsten Hoefler · 2021

Computer science

I/O is emerging as a major bottleneck for machine learning training, especially in distributed environments. Indeed, at large scale, I/O takes as much as 85% of training time. Addressing this I/O bottleneck necessitates careful optimizatio…

A Data-Centric Optimization Framework for Machine Learning Open

Oliver Rausch, Tal Ben‐Nun, Nikoli Dryden, Андрей Иванов, Shigang Li , et al. · 2021

Computer science Philosophy

Rapid progress in deep learning is leading to a diverse set of quickly changing models, with a dramatically growing demand for compute. However, as frameworks specialize performance optimization to patterns in popular networks, they implic…

Co-design Center for Exascale Machine Learning Technologies (ExaLearn) Open

Francis J. Alexander, James Ang, Jenna A. Bilbrey, J. Balewski, Tiernan Casey , et al. · 2021

Computer science

Rapid growth in data, computational methods, and computing power is driving a remarkable revolution in what variously is termed machine learning (ML), statistical learning, computational learning, and artificial intelligence. In addition t…

Near-optimal Prefetching System Open

Nikoli Dryden, Roman Böhringer, Tal Ben‐Nun, Torsten Hoefler · 2021

Computer science

Software used in the submission of "Clairvoyant Prefetching for Distributed Machine Learning I/O" by Dryden et al., to appear at Supercomputing 2021. For up-to-date versions, visit https://github.com/spcl/NoPFS.

Learning Combinatorial Node Labeling Algorithms Open

Lukas Gianinazzi, Maximilian Fries, Nikoli Dryden, Tal Ben‐Nun, Maciej Besta , et al. · 2021

Computer science Mathematics Engineering

We present a novel neural architecture to solve graph optimization problems where the solution consists of arbitrary node labels, allowing us to solve hard problems like graph coloring. We train our model using reinforcement learning, spec…

Motif Prediction with Graph Neural Networks Open

Maciej Besta, Raphael Grob, Cesare Miglioli, Nicola Bernold, Grzegorz Kwaśniewski , et al. · 2021

Computer science Geography Physics

Link prediction is one of the central problems in graph mining. However, recent studies highlight the importance of higher-order network analysis, where complex structures called motifs are the first-class citizens. We first show that exis…

Deep learning for post-processing ensemble weather forecasts Open

Peter Grönquist, Chengyuan Yao, Tal Ben‐Nun, Nikoli Dryden, Peter Dueben , et al. · 2021

Computer science Physics

Quantifying uncertainty in weather forecasts is critical, especially for predicting extreme weather events. This is typically accomplished with ensemble prediction systems, which consist of many perturbed numerical weather simulations, or …

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks Open

Torsten Hoefler, Dan Alistarh, Tal Ben‐Nun, Nikoli Dryden, Alexandra Peşte · 2021

Computer science Mathematics Biology

The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components. Similarly to their biological counterparts, sparse networks generalize just as wel…

The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism Open

Yosuke Oyama, Naoya Maruyama, Nikoli Dryden, Erin McCarthy, Peter Harrington , et al. · 2020

Computer science Mathematics

Here, we present scalable hybrid-parallel algorithms for training large-scale 3D convolutional neural networks. Deep learning-based emerging scientific workflows often require model training with large, high-dimensional samples, which can …

The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism Open

Yosuke Oyama, Naoya Maruyama, Nikoli Dryden, Erin McCarthy, Peter de B. Harrington , et al. · 2020

Computer science Mathematics Physics

We present scalable hybrid-parallel algorithms for training large-scale 3D convolutional neural networks. Deep learning-based emerging scientific workflows often require model training with large, high-dimensional samples, which can make t…

Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging Open

Shigang Li, Tal Ben‐Nun, Giorgi Nadiradze, Salvatore Digirolamo, Nikoli Dryden , et al. · 2020

Computer science Economics Physics

Deep learning at scale is dominated by communication time. Distributing samples across nodes usually yields the best performance, but poses scaling challenges due to global information dissemination and load imbalance across uneven sample …

Data Movement Is All You Need: A Case Study of Transformer Networks Open

Andrei Ivanov, Nikoli Dryden, Tal Ben‐Nun, Shigang Li, Torsten Hoefler · 2020

Computer science Engineering

Transformer neural networks have become widely used for language modeling and sequence learning tasks, and are one of the most important machine learning workloads today. Training one is a very compute-intensive task, often taking days or …

Data Movement Is All You Need: A Case Study on Optimizing Transformers Open

Andrei Ivanov, Nikoli Dryden, Tal Ben‐Nun, Shigang Li, Torsten Hoefler · 2020

Computer science Engineering

Transformers are one of the most important machine learning workloads today. Training one is a very compute-intensive task, often taking days or weeks, and significant attention has been given to optimizing transformers. Despite this, exis…

Shapeshifter Networks: Cross-layer Parameter Sharing for Scalable and Effective Deep Learning Open

Bryan A. Plummer, Nikoli Dryden, Julius Frost, Torsten Hoefler, Kate Saenko · 2020

Computer science Chemistry

We present Shapeshifter Networks (SSNs), a flexible neural network framework that improves performance and reduces memory requirements on a diverse set of scenarios over standard neural networks. Our approach is based on the observation th…

Neural Parameter Allocation Search Open

Bryan A. Plummer, Nikoli Dryden, Julius Frost, Torsten Hoefler, Kate Saenko · 2020

Computer science Economics Art

Training neural networks requires increasing amounts of memory. Parameter sharing can reduce memory and communication costs, but existing methods assume networks have many identical layers and utilize hand-crafted sharing strategies that f…

Deep Learning for Post-Processing Ensemble Weather Forecasts Open

Peter Grönquist, Chengyuan Yao, Tal Ben‐Nun, Nikoli Dryden, Peter Dueben , et al. · 2020

Computer science Geography

Quantifying uncertainty in weather forecasts is critical, especially for predicting extreme weather events. This is typically accomplished with ensemble prediction systems, which consist of many pe...

DiHydrogen Open

Naoya Maruyama, Brian Van Essen, Nikoli Dryden, Thomas R. Benson, Timothy Y Moon , et al. · 2020

Philosophy

DiHydrogen is the second version of the Hydrogen fork of the well-known distributed linear algebra library, Elemental. DiHydrogen is a GPU-accelerated distributed multilinear algebra interface with a particular emphasis on the needs of the…

Predicting Weather Uncertainty with Deep Convnets Open

Peter Grönquist, Tal Ben‐Nun, Nikoli Dryden, Peter Dueben, Luca Lavarini , et al. · 2019

Computer science Geography Mathematics

Modern weather forecast models perform uncertainty quantification using ensemble prediction systems, which collect nonparametric statistics based on multiple perturbed simulations. To provide accurate estimation, dozens of such computation…

Nikoli Dryden YOU? Author Swipe