Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models

Exploring foci of: arXiv (Cornell University) Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models October 2024 • Keivan Alizadeh, Iman Mirzadeh, Hooman Shahrokhi, Dmitry Belenko, F.W. Sun, Minsik Cho, Mohammad Hossein Sekhavat, Moin Nabi, Mehrdad Farajtabar Large Language Models (LLMs) typically generate outputs token by token using a fixed compute budget, leading to inefficient resource utilization. To address this shortcoming, recent advancements in mixture of expert (MoE) models, speculative decoding, and early exit strategies leverage the insight that computational demands can vary significantly based on the complexity and nature of the input. However, identifying optimal routing patterns for dynamic execution remains an open challenge, limiting the full potentia… Open Article Page

Computer Science Programming Language Theoretical Computer Science Open Article