Basil Hosmer
YOU?
Author Swipe
View article: Characterizing and Efficiently Accelerating Multimodal Generation Model Inference
Characterizing and Efficiently Accelerating Multimodal Generation Model Inference Open
Generative artificial intelligence (AI) technology is revolutionizing the computing industry. Not only its applications have broadened to various sectors but also poses new system design and optimization opportunities. The technology is ca…
View article: Is Flash Attention Stable?
Is Flash Attention Stable? Open
Training large-scale machine learning models poses distinct system challenges, given both the size and complexity of today's workloads. Recently, many organizations training state-of-the-art Generative AI models have reported cases of inst…
View article: LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding Open
We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs). First, during training we apply layer dropout, with low dropout rates for earlier layers and higher dropout rates for later layers, and an …
View article: Generative AI Beyond LLMs: System Implications of Multi-Modal Generation
Generative AI Beyond LLMs: System Implications of Multi-Modal Generation Open
As the development of large-scale Generative AI models evolve beyond text (1D) generation to include image (2D) and video (3D) generation, processing spatial and temporal information presents unique challenges to quality, performance, and …
View article: Gradient Descent: The Ultimate Optimizer
Gradient Descent: The Ultimate Optimizer Open
Working with any gradient-based machine learning algorithm involves the tedious task of tuning the optimizer's hyperparameters, such as its step size. Recent work has shown how the step size can itself be optimized alongside the model para…