Zenodo (CERN European Organization for Nuclear Research)
AI-Driven Predictive Load Orchestration for Distributed LLM Inference
December 2025 • Revista, Zen, IA, 10
This paper presents a novel framework for AI-driven predictive load orchestration specifically tailored for distributed Large Language Model (LLM) inference. As LLMs scale in size and complexity, deploying them across distributed computing environments becomes essential for meeting high throughput and low latency requirements. Traditional load balancing techniques often struggle with the dynamic and heterogeneous computational demands of LLM inference, leading to suboptimal resource utilization and increased respo…