Michael Gschwind
YOU?
Author Swipe
View article: AI: It's All About Inference Now
AI: It's All About Inference Now Open
As the scaling of pretraining is reaching a plateau of diminishing returns, model inference is quickly becoming an important driver for model performance. Today, test-time compute scaling offers a new, exciting avenue to increase model per…
View article: Multi-petascale highly efficient parallel supercomputer
Multi-petascale highly efficient parallel supercomputer Open
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The S…
View article: Matrix multiplication operations using pair-wise load and splat operations
Matrix multiplication operations using pair-wise load and splat operations Open
Mechanisms for performing a matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A pair-wise load and sp…
View article: Sustainable AI: Environmental Implications, Challenges and Opportunities
Sustainable AI: Environmental Implications, Challenges and Opportunities Open
This paper explores the environmental impact of the super-linear growth trends for AI from a holistic perspective, spanning Data, Algorithms, and System Hardware. We characterize the carbon footprint of AI computing by examining the model …
View article: First-Generation Inference Accelerator Deployment at Facebook
First-Generation Inference Accelerator Deployment at Facebook Open
In this paper, we provide a deep dive into the deployment of inference accelerators at Facebook. Many of our ML workloads have unique characteristics, such as sparse memory accesses, large model sizes, as well as high compute, memory and n…
View article: Steering Committee
Steering Committee Open