Sam Adé Jacobs
YOU?
Author Swipe
View article: Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer Open
Large Language Models (LLMs) with long context capabilities are integral to complex tasks in natural language processing and computational biology, such as text generation and protein sequence analysis. However, training LLMs directly on e…
View article: Universal Checkpointing: A Flexible and Efficient Distributed Checkpointing System for Large-Scale DNN Training with Reconfigurable Parallelis
Universal Checkpointing: A Flexible and Efficient Distributed Checkpointing System for Large-Scale DNN Training with Reconfigurable Parallelis Open
Deep neural network (DNN) training continues to scale rapidly in terms of model size, data volume, and sequence length, to the point where multiple machines are required to fit large models for training. Different distributed and parallel …
View article: Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Open
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5…
View article: DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models Open
Computation in a typical Transformer-based large language model (LLM) can be characterized by batch size, hidden dimension, number of layers, and sequence length. Until now, system works for accelerating LLM training have focused on the fi…
View article: ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
ZeRO++: Extremely Efficient Collective Communication for Giant Model Training Open
Zero Redundancy Optimizer (ZeRO) has been used to train a wide range of large language models on massive GPUs clusters due to its ease of use, efficiency, and good scalability. However, when training on low-bandwidth clusters, or at scale …
View article: A flexible proton beam imaging energy spectrometer (PROBIES) for high repetition rate or single-shot high energy density (HED) experiments (invited)
A flexible proton beam imaging energy spectrometer (PROBIES) for high repetition rate or single-shot high energy density (HED) experiments (invited) Open
The PROBIES diagnostic is a new, highly flexible, imaging and energy spectrometer designed for laser-accelerated protons. The diagnostic can detect low-mode spatial variations in the proton beam profile while resolving multiple energies on…
View article: Scalable Composition and Analysis Techniques for Massive Scientific Workflows
Scalable Composition and Analysis Techniques for Massive Scientific Workflows Open
Composite science workflows are gaining traction to manage the combined effects of (1) extreme hardware heterogeneity in new High Performance Computing (HPC) systems and (2) growing software complexity – effects necessitated by the converg…
View article: Learning Interpretable Models Through Multi-Objective Neural Architecture Search
Learning Interpretable Models Through Multi-Objective Neural Architecture Search Open
Monumental advances in deep learning have led to unprecedented achievements across various domains. While the performance of deep neural networks is indubitable, the architectural design and interpretability of such models are nontrivial. …
View article: Enabling rapid COVID-19 small molecule drug design through scalable deep learning of generative models
Enabling rapid COVID-19 small molecule drug design through scalable deep learning of generative models Open
We improved the quality and reduced the time to produce machine learned models for use in small molecule antiviral design. Our globally asynchronous multi-level parallel training approach strong scales to all of Sierra with up to 97.7% eff…
View article: Merlin: Enabling Machine Learning-Ready HPC Ensembles
Merlin: Enabling Machine Learning-Ready HPC Ensembles Open
With the growing complexity of computational and experimental facilities, many scientific researchers are turning to machine learning (ML) techniques to analyze large scale ensemble data. With complexities such as multi-component workflows…
View article: Parallelizing Training of Deep Generative Models on Massive Scientific Datasets
Parallelizing Training of Deep Generative Models on Massive Scientific Datasets Open
Training deep neural networks on large scientific data is a challenging task that requires enormous compute power, especially if no pre-trained models exist to initialize the process. We present a novel tournament method to train tradition…
View article: Scalable Topological Data Analysis and Visualization for Evaluating Data-Driven Models in Scientific Applications
Scalable Topological Data Analysis and Visualization for Evaluating Data-Driven Models in Scientific Applications Open
With the rapid adoption of machine learning techniques for large-scale applications in science and engineering comes the convergence of two grand challenges in visualization. First, the utilization of black box models (e.g., deep neural ne…
View article: Scalable Topological Data Analysis and Visualization for Evaluating Data-Driven Models in Scientific Applications
Scalable Topological Data Analysis and Visualization for Evaluating Data-Driven Models in Scientific Applications Open
With the rapid adoption of machine learning techniques for large-scale applications in science and engineering comes the convergence of two grand challenges in visualization. First, the utilization of black box models (e.g., deep neural ne…
View article: Distinguishing between Normal and Cancer Cells Using Autoencoder Node Saliency
Distinguishing between Normal and Cancer Cells Using Autoencoder Node Saliency Open
Gene expression profiles have been widely used to characterize patterns of cellular responses to diseases. As data becomes available, scalable learning toolkits become essential to processing large datasets using deep learning models to mo…
View article: Towards Scalable Parallel Training of Deep Neural Networks
Towards Scalable Parallel Training of Deep Neural Networks Open
We propose a new framework for parallelizing deep neural network training that maximize the amount of data that is ingested by the training algorithm. Our proposed framework called Livermore Tournament Fast Batch Learning (LTFB) targets la…
View article: Communication Quantization for Data-Parallel Training of Deep Neural Networks
Communication Quantization for Data-Parallel Training of Deep Neural Networks Open
We study data-parallel training of deep neural networks on high-performance computing infrastructure. The key problem with scaling data-parallel training is avoiding severe communication/computation imbalance. We explore quantizing gradien…