Exploring foci of
2024-10-09
TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training
2024-10-09 • Wanchao Liang, Tianyu Liu, Less Wright, Will Constable, Andrew Gu, Chien-Chin Huang, Iris Zhang, Wei Feng, Howard Huang, Junjie Wang, Sanket Purand...
The development of large language models (LLMs) has been instrumental in advancing state-of-the-art natural language processing applications. Training LLMs with billions of parameters and trillions of tokens require sophisticated distributed systems that enable composing and comparing several state-of-the-art techniques in order to efficiently scale across thousands of accelerators. However, existing solutions are complex, scattered across multiple libraries/repositories, lack interoperability, and are cumbersome …
One-Stop Shop
Pytorch
Exploring foci of
2023-04-27
TorchBench: Benchmarking PyTorch with High API Surface Coverage
2023-04-27 • Yueming Hao, Xu Zhao, Bin Bao, David Berard, Will Constable, Adnan Aziz, Xu Liu
Deep learning (DL) has been a revolutionary technique in various domains. To facilitate the model development and deployment, many deep learning frameworks are proposed, among which PyTorch is one of the most popular solutions. The performance of ecosystem around PyTorch is critically important, which saves the costs of training models and reduces the response time of model inferences. In this paper, we propose TorchBench, a novel benchmark suite to study the performance of PyTorch software stack. Unlike existing …
Pytorch
Benchmarking
Exploring foci of
2021-04-01
Using Python for Model Inference in Deep Learning
2021-04-01 • Zachary DeVito, Jason Ansel, Will Constable, Michael Suo, Ailing Zhang, Kim Hazelwood
Python has become the de-facto language for training deep neural networks, coupling a large suite of scientific computing libraries with efficient libraries for tensor computation such as PyTorch or TensorFlow. However, when models are used for inference they are typically extracted from Python as TensorFlow graphs or TorchScript programs in order to meet performance and packaging constraints. The extraction process can be time consuming, impeding fast prototyping. We show how it is possible to meet these performa…
Artificial Intelligence
Deep Learning
Computer Science
Archaeology
Machine Learning
Theoretical Computer Science
History
Database
Programming Language
Exploring foci of
2018-01-24
Intel nGraph: An Intermediate Representation, Compiler, and Executor for Deep Learning
2018-01-24 • Scott Cyphers, Arjun K. Bansal, Anahita Bhiwandiwalla, Jayaram Bobba, Matthew Brookhart, Avijit Chakraborty, Will Constable, Christian Convey, Leon...
The Deep Learning (DL) community sees many novel topologies published each year. Achieving high performance on each new topology remains challenging, as each requires some level of manual effort. This issue is compounded by the proliferation of frameworks and hardware platforms. The current approach, which we call "direct optimization", requires deep changes within each framework to improve the training performance for each hardware backend (CPUs, GPUs, FPGAs, ASICs) and requires $\mathcal{O}(fp)$ effort; where $f…
Intermediate-Range Ballistic Missile
Intel 8085
Intel Core 2
West Texas Intermediate
Intel Core
Intermediate Filament
Intel 8088
Intermediate Frequency
Intel Atom