Explanipedia

Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer Open

Jinghan Yao, Sam Adé Jacobs, Masahiro Tanaka, Olatunji Ruwase, Aamir Shafi , et al. · 2024

Large Language Models (LLMs) with long context capabilities are integral to complex tasks in natural language processing and computational biology, such as text generation and protein sequence analysis. However, training LLMs directly on e…

Universal Checkpointing: A Flexible and Efficient Distributed Checkpointing System for Large-Scale DNN Training with Reconfigurable Parallelis Open

Xinyu Lian, Sam Adé Jacobs, Л. В. Куриленко, Masahiro Tanaka, Stas Bekman , et al. · 2024

Computer science Physics

Deep neural network (DNN) training continues to scale rapidly in terms of model size, data volume, and sequence length, to the point where multiple machines are required to fit large models for training. Different distributed and parallel …

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Open

Marah Abdin, Sam Adé Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Hassan Awadallah , et al. · 2024

Computer science Philosophy

We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5…

DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models Open

Sam Adé Jacobs, Masahiro Tanaka, Chengming Zhang, Minjia Zhang, Leon Song , et al. · 2023

Computer science Engineering Biology

Computation in a typical Transformer-based large language model (LLM) can be characterized by batch size, hidden dimension, number of layers, and sequence length. Until now, system works for accelerating LLM training have focused on the fi…

ZeRO++: Extremely Efficient Collective Communication for Giant Model Training Open

Guanhua Wang, Heyang Qin, Sam Adé Jacobs, Connor Holmes, Samyam Rajbhandari , et al. · 2023

Computer science Philosophy Physics

Zero Redundancy Optimizer (ZeRO) has been used to train a wide range of large language models on massive GPUs clusters due to its ease of use, efficiency, and good scalability. However, when training on low-bandwidth clusters, or at scale …

A flexible proton beam imaging energy spectrometer (PROBIES) for high repetition rate or single-shot high energy density (HED) experiments (invited) Open

D. Mariscal, Blagoje Djordjevic, Rushil Anirudh, Peer‐Timo Bremer, Patrick Campbell , et al. · 2023

Materials science Physics Computer science

The PROBIES diagnostic is a new, highly flexible, imaging and energy spectrometer designed for laser-accelerated protons. The diagnostic can detect low-mode spatial variations in the proton beam profile while resolving multiple energies on…

Scalable Composition and Analysis Techniques for Massive Scientific Workflows Open

Dong H. Ahn, Xiaohua Zhang, Jeffrey E. Mast, Stephen Herbein, Francesco Di Natale , et al. · 2022

Computer science

Composite science workflows are gaining traction to manage the combined effects of (1) extreme hardware heterogeneity in new High Performance Computing (HPC) systems and (2) growing software complexity – effects necessitated by the converg…

Learning Interpretable Models Through Multi-Objective Neural Architecture Search Open

Zachariah Carmichael, Tim Moon, Sam Adé Jacobs · 2021

Computer science Engineering Art

Monumental advances in deep learning have led to unprecedented achievements across various domains. While the performance of deep neural networks is indubitable, the architectural design and interpretability of such models are nontrivial. …

Enabling rapid COVID-19 small molecule drug design through scalable deep learning of generative models Open

Sam Adé Jacobs, Tim Moon, Kevin McLoughlin, Derek Jones, David Hysom , et al. · 2021

Computer science Medicine Biology

We improved the quality and reduced the time to produce machine learned models for use in small molecule antiviral design. Our globally asynchronous multi-level parallel training approach strong scales to all of Sierra with up to 97.7% eff…

Merlin: Enabling Machine Learning-Ready HPC Ensembles Open

J. L. Peterson, Rushil Anirudh, Kevin Athey, Benjamin Bay, Peer‐Timo Bremer , et al. · 2019

Computer science Geography Economics

With the growing complexity of computational and experimental facilities, many scientific researchers are turning to machine learning (ML) techniques to analyze large scale ensemble data. With complexities such as multi-component workflows…

Parallelizing Training of Deep Generative Models on Massive Scientific Datasets Open

Sam Adé Jacobs, Brian Van Essen, David Hysom, Jae-Seung Yeom, Tim Moon , et al. · 2019

Computer science

Training deep neural networks on large scientific data is a challenging task that requires enormous compute power, especially if no pre-trained models exist to initialize the process. We present a novel tournament method to train tradition…

Scalable Topological Data Analysis and Visualization for Evaluating Data-Driven Models in Scientific Applications Open

Shusen Liu, Jim Gaffney, Luc Peterson, Peter Robinson, Harsh Bhatia , et al. · 2019

Computer science Mathematics

With the rapid adoption of machine learning techniques for large-scale applications in science and engineering comes the convergence of two grand challenges in visualization. First, the utilization of black box models (e.g., deep neural ne…

Scalable Topological Data Analysis and Visualization for Evaluating Data-Driven Models in Scientific Applications Open

Shusen Liu, Di Wang, Dan Maljovec, Rushil Anirudh, Jayaraman J. Thiagarajan , et al. · 2019

Computer science Engineering

With the rapid adoption of machine learning techniques for large-scale applications in science and engineering comes the convergence of two grand challenges in visualization. First, the utilization of black box models (e.g., deep neural ne…

Distinguishing between Normal and Cancer Cells Using Autoencoder Node Saliency Open

Ya Ju Fan, Jonathan Allen, Sam Adé Jacobs, Brian C. Van Essen · 2019

Computer science

Gene expression profiles have been widely used to characterize patterns of cellular responses to diseases. As data becomes available, scalable learning toolkits become essential to processing large datasets using deep learning models to mo…

Towards Scalable Parallel Training of Deep Neural Networks Open

Sam Adé Jacobs, Nikoli Dryden, Roger Pearce, Brian Van Essen · 2017

Computer science Materials science Mathematics

We propose a new framework for parallelizing deep neural network training that maximize the amount of data that is ingested by the training algorithm. Our proposed framework called Livermore Tournament Fast Batch Learning (LTFB) targets la…

Communication Quantization for Data-Parallel Training of Deep Neural Networks Open

Nikoli Dryden, Tim Moon, Sam Adé Jacobs, Brian Van Essen · 2016

Computer science Geography

We study data-parallel training of deep neural networks on high-performance computing infrastructure. The key problem with scaling data-parallel training is avoiding severe communication/computation imbalance. We explore quantizing gradien…

Sam Adé Jacobs YOU? Author Swipe