Nikoli Dryden
YOU?
Author Swipe
View article: Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity
Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity Open
Accelerating large language model (LLM) inference is critical for real-world deployments requiring high throughput and low latency. Contextual sparsity, where each token dynamically activates only a small subset of the model parameters, sh…
View article: Lion Cub: Minimizing Communication Overhead in Distributed Lion
Lion Cub: Minimizing Communication Overhead in Distributed Lion Open
Communication overhead is a key challenge in distributed deep learning, especially on slower Ethernet interconnects, and given current hardware trends, communication is likely to become a major bottleneck. While gradient compression techni…
View article: Scaling Large-scale GNN Training to Thousands of Processors on CPU-based Supercomputers
Scaling Large-scale GNN Training to Thousands of Processors on CPU-based Supercomputers Open
Graph Convolutional Networks (GCNs), particularly for large-scale graphs, are crucial across numerous domains. However, training distributed full-batch GCNs on large-scale graphs suffers from inefficient memory access patterns and high com…
View article: Lion Cub: Minimizing Communication Overhead in Distributed Lion
Lion Cub: Minimizing Communication Overhead in Distributed Lion Open
Communication overhead is a key challenge in distributed deep learning, especially on slower Ethernet intercon nects, and given current hardware trends, communication is likely to become a major bottleneck. While gradient compression techn…
View article: Learning to Compose SuperWeights for Neural Parameter Allocation Search
Learning to Compose SuperWeights for Neural Parameter Allocation Search Open
Neural parameter allocation search (NPAS) automates parameter sharing by obtaining weights for a network given an arbitrary, fixed parameter budget. Prior work has two major drawbacks we aim to address. First, there is a disconnect in the …
View article: Cached Operator Reordering: A Unified View for Fast GNN Training
Cached Operator Reordering: A Unified View for Fast GNN Training Open
Graph Neural Networks (GNNs) are a powerful tool for handling structured graph data and addressing tasks such as node classification, graph classification, and clustering. However, the sparse nature of GNN computation poses new challenges …
View article: STen: Productive and Efficient Sparsity in PyTorch
STen: Productive and Efficient Sparsity in PyTorch Open
As deep learning models grow, sparsity is becoming an increasingly critical component of deep neural networks, enabling improved performance and reduced storage. However, existing frameworks offer poor support for sparsity. Specialized spa…
View article: Spatial Mixture-of-Experts
Spatial Mixture-of-Experts Open
Many data have an underlying dependence on spatial location; it may be weather on the Earth, a simulation on a mesh, or a registered image. Yet this feature is rarely taken advantage of, and violates common assumptions made by many neural …
View article: Neural Graph Databases
Neural Graph Databases Open
Graph databases (GDBs) enable processing and analysis of unstructured, complex, rich, and usually vast graph datasets. Despite the large significance of GDBs in both academia and industry, little effort has been made into integrating them …
View article: ENS-10: A Dataset For Post-Processing Ensemble Weather Forecasts
ENS-10: A Dataset For Post-Processing Ensemble Weather Forecasts Open
Post-processing ensemble prediction systems can improve the reliability of weather forecasting, especially for extreme event prediction. In recent years, different machine learning models have been developed to improve the quality of weath…
View article: Clairvoyant prefetching for distributed machine learning I/O
Clairvoyant prefetching for distributed machine learning I/O Open
I/O is emerging as a major bottleneck for machine learning training, especially in distributed environments. Indeed, at large scale, I/O takes as much as 85% of training time. Addressing this I/O bottleneck necessitates careful optimizatio…
View article: A Data-Centric Optimization Framework for Machine Learning
A Data-Centric Optimization Framework for Machine Learning Open
Rapid progress in deep learning is leading to a diverse set of quickly changing models, with a dramatically growing demand for compute. However, as frameworks specialize performance optimization to patterns in popular networks, they implic…
View article: Co-design Center for Exascale Machine Learning Technologies (ExaLearn)
Co-design Center for Exascale Machine Learning Technologies (ExaLearn) Open
Rapid growth in data, computational methods, and computing power is driving a remarkable revolution in what variously is termed machine learning (ML), statistical learning, computational learning, and artificial intelligence. In addition t…
View article: Near-optimal Prefetching System
Near-optimal Prefetching System Open
Software used in the submission of "Clairvoyant Prefetching for Distributed Machine Learning I/O" by Dryden et al., to appear at Supercomputing 2021. For up-to-date versions, visit https://github.com/spcl/NoPFS.
View article: Learning Combinatorial Node Labeling Algorithms
Learning Combinatorial Node Labeling Algorithms Open
We present a novel neural architecture to solve graph optimization problems where the solution consists of arbitrary node labels, allowing us to solve hard problems like graph coloring. We train our model using reinforcement learning, spec…
View article: Motif Prediction with Graph Neural Networks
Motif Prediction with Graph Neural Networks Open
Link prediction is one of the central problems in graph mining. However, recent studies highlight the importance of higher-order network analysis, where complex structures called motifs are the first-class citizens. We first show that exis…
View article: Deep learning for post-processing ensemble weather forecasts
Deep learning for post-processing ensemble weather forecasts Open
Quantifying uncertainty in weather forecasts is critical, especially for predicting extreme weather events. This is typically accomplished with ensemble prediction systems, which consist of many perturbed numerical weather simulations, or …
View article: Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks Open
The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components. Similarly to their biological counterparts, sparse networks generalize just as wel…
View article: The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism
The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism Open
Here, we present scalable hybrid-parallel algorithms for training large-scale 3D convolutional neural networks. Deep learning-based emerging scientific workflows often require model training with large, high-dimensional samples, which can …
View article: The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism
The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism Open
We present scalable hybrid-parallel algorithms for training large-scale 3D convolutional neural networks. Deep learning-based emerging scientific workflows often require model training with large, high-dimensional samples, which can make t…
View article: Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging
Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging Open
Deep learning at scale is dominated by communication time. Distributing samples across nodes usually yields the best performance, but poses scaling challenges due to global information dissemination and load imbalance across uneven sample …
View article: Data Movement Is All You Need: A Case Study of Transformer Networks
Data Movement Is All You Need: A Case Study of Transformer Networks Open
Transformer neural networks have become widely used for language modeling and
sequence learning tasks, and are one of the most important machine learning
workloads today. Training one is a very compute-intensive task, often taking
days or …
View article: Data Movement Is All You Need: A Case Study on Optimizing Transformers
Data Movement Is All You Need: A Case Study on Optimizing Transformers Open
Transformers are one of the most important machine learning workloads today. Training one is a very compute-intensive task, often taking days or weeks, and significant attention has been given to optimizing transformers. Despite this, exis…
View article: Shapeshifter Networks: Cross-layer Parameter Sharing for Scalable and Effective Deep Learning
Shapeshifter Networks: Cross-layer Parameter Sharing for Scalable and Effective Deep Learning Open
We present Shapeshifter Networks (SSNs), a flexible neural network framework that improves performance and reduces memory requirements on a diverse set of scenarios over standard neural networks. Our approach is based on the observation th…
View article: Neural Parameter Allocation Search
Neural Parameter Allocation Search Open
Training neural networks requires increasing amounts of memory. Parameter sharing can reduce memory and communication costs, but existing methods assume networks have many identical layers and utilize hand-crafted sharing strategies that f…
View article: Deep Learning for Post-Processing Ensemble Weather Forecasts
Deep Learning for Post-Processing Ensemble Weather Forecasts Open
Quantifying uncertainty in weather forecasts is critical, especially for predicting extreme weather events. This is typically accomplished with ensemble prediction systems, which consist of many pe...
View article: DiHydrogen
DiHydrogen Open
DiHydrogen is the second version of the Hydrogen fork of the well-known distributed linear algebra library, Elemental. DiHydrogen is a GPU-accelerated distributed multilinear algebra interface with a particular emphasis on the needs of the…
View article: Predicting Weather Uncertainty with Deep Convnets
Predicting Weather Uncertainty with Deep Convnets Open
Modern weather forecast models perform uncertainty quantification using ensemble prediction systems, which collect nonparametric statistics based on multiple perturbed simulations. To provide accurate estimation, dozens of such computation…