S. T. Lai
YOU?
Author Swipe
View article: AdaServe: Accelerating Multi-SLO LLM Serving with SLO-Customized Speculative Decoding
AdaServe: Accelerating Multi-SLO LLM Serving with SLO-Customized Speculative Decoding Open
Modern large language model (LLM) applications exhibit diverse service-level objectives (SLOs), from low-latency requirements in interactive coding assistants to more relaxed constraints in data wrangling tasks. Existing LLM serving system…