Explanipedia

Revisiting Reliability in Large-Scale Machine Learning Research Clusters Open

Apostolos Kokolis, Michael Kuchnik, John P. Hoffman, Adithya Kumar, Parth Malani , et al. · 2024

Reliability is a fundamental challenge in operating large-scale machine learning (ML) infrastructures, particularly as the scale of ML models and training clusters continues to grow. Despite decades of research on infrastructure failures, …

Is Flash Attention Stable? Open

Alicia Golden, Samuel Hsia, Fei Sun, Bilge Acun, Basil Hosmer , et al. · 2024

Computer science Psychology Physics

Training large-scale machine learning models poses distinct system challenges, given both the size and complexity of today's workloads. Recently, many organizations training state-of-the-art Generative AI models have reported cases of inst…

Generative AI Beyond LLMs: System Implications of Multi-Modal Generation Open

Alicia Golden, Samuel Hsia, Fei Sun, Bilge Acun, Basil Hosmer , et al. · 2023

Computer science Engineering History

As the development of large-scale Generative AI models evolve beyond text (1D) generation to include image (2D) and video (3D) generation, processing spatial and temporal information presents unique challenges to quality, performance, and …

MAD Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems Open

Samuel Hsia, Alicia Golden, Bilge Acun-Uyan, Newsha Ardalani, Zachary DeVito , et al. · 2023

Computer science History

Training and deploying large-scale machine learning models is time-consuming, requires significant distributed computing infrastructures, and incurs high operational costs. Our analysis, grounded in real-world large model training on datac…

A Theory on Adam Instability in Large-Scale Machine Learning Open

Igor Molybog, Peter J. Albert, Moya Chen, Zachary DeVito, David Esiobu , et al. · 2023

Computer science Mathematics Geography

We present a theory for the previously unexplained divergent behavior noticed in the training of large language models. We argue that the phenomenon is an artifact of the dominant optimization algorithm used for training, called Adam. We o…

Torch.fx: Practical Program Capture and Transformation for Deep Learning in Python Open

James K. Reed, Zachary DeVito, Horace He, Ansley Ussery, Jason Ansel · 2021

Computer science Engineering Chemistry

Modern deep learning frameworks provide imperative, eager execution programming interfaces embedded in Python to provide a productive development experience. However, deep learning practitioners sometimes need to capture and transform prog…

Using Python for Model Inference in Deep Learning Open

Zachary DeVito, Jason Ansel, Will Constable, Michael Suo, Ailing Zhang , et al. · 2021

Computer science History

Python has become the de-facto language for training deep neural networks, coupling a large suite of scientific computing libraries with efficient libraries for tensor computation such as PyTorch or TensorFlow. However, when models are use…

Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching Open

Kalika Bali, Pushpak Bhattacharyya, Marina Fomicheva, Philipp Koehn, Holger Schwenk , et al. · 2021

Computer science Philosophy

Bienvenidos to the proceedings of the fifth edition of the workshop on computational approaches for linguistic code-switching (CALCS-2021)!Code-switching is this very interesting phenomenon where multilingual speakers communicate by moving…

PyTorch: An Imperative Style, High-Performance Deep Learning Library Open

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury , et al. · 2019

Computer science Art

Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style …

The Next 700 Accelerated Layers Open

Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito , et al. · 2019

Computer science Mathematics

Deep learning frameworks automate the deployment, distribution, synchronization, memory allocation, and hardware acceleration of models represented as graphs of computational operators. These operators wrap high-performance libraries such …

Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions Open

Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito , et al. · 2018

Computer science Mathematics

Deep learning models with convolutional and recurrent networks are now ubiquitous and analyze massive amounts of audio, image, video, text and graph data, with applications in automatic translation, speech-to-text, scene understanding, ran…

Opt: A Domain Specific Language for Non-linear Least Squares Optimization in Graphics and Imaging Open

Zachary DeVito, Michael W. Mara, Michael Zollhöfer, Gilbert Bernstein, Jonathan Ragan‐Kelley , et al. · 2016

Computer science Mathematics Physics

Many graphics and vision problems can be expressed as non-linear least squares optimizations of objective functions over visual data, such as images and meshes. The mathematical descriptions of these functions are extremely concise, but th…

Zachary DeVito YOU? Author Swipe