arXiv (Cornell University)
Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models
October 2024 • Keivan Alizadeh, Iman Mirzadeh, Hooman Shahrokhi, Dmitry Belenko, F.W. Sun, Minsik Cho, Mohammad Hossein Sekhavat, Moin Nabi, Mehrdad Farajtabar
Large Language Models (LLMs) typically generate outputs token by token using a fixed compute budget, leading to inefficient resource utilization. To address this shortcoming, recent advancements in mixture of expert (MoE) models, speculative decoding, and early exit strategies leverage the insight that computational demands can vary significantly based on the complexity and nature of the input. However, identifying optimal routing patterns for dynamic execution remains an open challenge, limiting the full potentia…