Ram Pasunuru
YOU?
Author Swipe
View article: Byte Latent Transformer: Patches Scale Better Than Tokens
Byte Latent Transformer: Patches Scale Better Than Tokens Open
We introduce the Byte Latent Transformer (BLT), a new byte-level LLM architecture that, for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency and robustness. BLT encod…
View article: Characterizing and Efficiently Accelerating Multimodal Generation Model Inference
Characterizing and Efficiently Accelerating Multimodal Generation Model Inference Open
Generative artificial intelligence (AI) technology is revolutionizing the computing industry. Not only its applications have broadened to various sectors but also poses new system design and optimization opportunities. The technology is ca…
View article: The ART of LLM Refinement: Ask, Refine, and Trust
The ART of LLM Refinement: Ask, Refine, and Trust Open
In recent years, Large Language Models (LLMs) have demonstrated remarkable generative abilities, but can they judge the quality of their own generations? A popular concept, referred to as self-refinement, postulates that LLMs can detect an…
View article: Augmented Language Models: a Survey
Augmented Language Models: a Survey Open
This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools. The former is defined as decomposing a potentially complex task into simpler subtasks while the latter consists in c…