AI-Driven Predictive Load Orchestration for Distributed LLM Inference Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.5281/zenodo.17828172
This paper presents a novel framework for AI-driven predictive load orchestration specifically tailored for distributed Large Language Model (LLM) inference. As LLMs scale in size and complexity, deploying them across distributed computing environments becomes essential for meeting high throughput and low latency requirements. Traditional load balancing techniques often struggle with the dynamic and heterogeneous computational demands of LLM inference, leading to suboptimal resource utilization and increased response times. Our proposed approach leverages advanced artificial intelligence techniques, including machine learning for demand forecasting and reinforcement learning for dynamic resource allocation, to predict future inference loads and intelligently orchestrate computational resources across a cluster. We detail a methodology encompassing real-time telemetry collection, predictive modeling of token generation rates and model specific computational requirements, and a policy-driven orchestration engine. This framework aims to minimize inference latency, maximize GPU and CPU utilization, and ensure service reliability under fluctuating workloads. The paper discusses the architectural components, algorithmic considerations, and potential benefits of such an AI-driven system, highlighting its potential to significantly enhance the efficiency and scalability of large-scale LLM deployments.
Related Topics
- Type
- article
- Landing Page
- https://doi.org/10.5281/zenodo.17828172
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W7109035557
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W7109035557Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.5281/zenodo.17828172Digital Object Identifier
- Title
-
AI-Driven Predictive Load Orchestration for Distributed LLM InferenceWork title
- Type
-
articleOpenAlex work type
- Publication year
-
2025Year of publication
- Publication date
-
2025-12-05Full publication date if available
- Authors
-
Revista, Zen, IA, 10List of authors in order
- Landing page
-
https://doi.org/10.5281/zenodo.17828172Publisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://doi.org/10.5281/zenodo.17828172Direct OA link when available
- Concepts
-
Orchestration, Computer science, Scalability, Distributed computing, Inference, Security token, Reliability (semiconductor), Latency (audio), Reinforcement learning, Resource (disambiguation), Machine learning, Throughput, Load balancing (electrical power), Artificial intelligence, Predictive modelling, Computational model, Big data, Service (business), Computational resource, Scale (ratio), Robustness (evolution), Resource allocationTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W7109035557 |
|---|---|
| doi | https://doi.org/10.5281/zenodo.17828172 |
| ids.doi | https://doi.org/10.5281/zenodo.17828172 |
| ids.openalex | https://openalex.org/W7109035557 |
| fwci | 0.0 |
| type | article |
| title | AI-Driven Predictive Load Orchestration for Distributed LLM Inference |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T14347 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.16177503764629364 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1710 |
| topics[0].subfield.display_name | Information Systems |
| topics[0].display_name | Big Data and Digital Economy |
| topics[1].id | https://openalex.org/T10101 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.11658641695976257 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1710 |
| topics[1].subfield.display_name | Information Systems |
| topics[1].display_name | Cloud Computing and Resource Management |
| topics[2].id | https://openalex.org/T12127 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.0705324038863182 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1705 |
| topics[2].subfield.display_name | Computer Networks and Communications |
| topics[2].display_name | Software System Performance and Reliability |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C199168358 |
| concepts[0].level | 3 |
| concepts[0].score | 0.7981753945350647 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q3367000 |
| concepts[0].display_name | Orchestration |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.7831945419311523 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C48044578 |
| concepts[2].level | 2 |
| concepts[2].score | 0.7179195284843445 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q727490 |
| concepts[2].display_name | Scalability |
| concepts[3].id | https://openalex.org/C120314980 |
| concepts[3].level | 1 |
| concepts[3].score | 0.6260199546813965 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q180634 |
| concepts[3].display_name | Distributed computing |
| concepts[4].id | https://openalex.org/C2776214188 |
| concepts[4].level | 2 |
| concepts[4].score | 0.6117295026779175 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q408386 |
| concepts[4].display_name | Inference |
| concepts[5].id | https://openalex.org/C48145219 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5075799226760864 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q1335365 |
| concepts[5].display_name | Security token |
| concepts[6].id | https://openalex.org/C43214815 |
| concepts[6].level | 3 |
| concepts[6].score | 0.41454461216926575 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q7310987 |
| concepts[6].display_name | Reliability (semiconductor) |
| concepts[7].id | https://openalex.org/C82876162 |
| concepts[7].level | 2 |
| concepts[7].score | 0.411941796541214 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q17096504 |
| concepts[7].display_name | Latency (audio) |
| concepts[8].id | https://openalex.org/C97541855 |
| concepts[8].level | 2 |
| concepts[8].score | 0.4007797837257385 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q830687 |
| concepts[8].display_name | Reinforcement learning |
| concepts[9].id | https://openalex.org/C206345919 |
| concepts[9].level | 2 |
| concepts[9].score | 0.3761606812477112 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q20380951 |
| concepts[9].display_name | Resource (disambiguation) |
| concepts[10].id | https://openalex.org/C119857082 |
| concepts[10].level | 1 |
| concepts[10].score | 0.3544296324253082 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[10].display_name | Machine learning |
| concepts[11].id | https://openalex.org/C157764524 |
| concepts[11].level | 3 |
| concepts[11].score | 0.3470892012119293 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q1383412 |
| concepts[11].display_name | Throughput |
| concepts[12].id | https://openalex.org/C138959212 |
| concepts[12].level | 3 |
| concepts[12].score | 0.3408467769622803 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q1806783 |
| concepts[12].display_name | Load balancing (electrical power) |
| concepts[13].id | https://openalex.org/C154945302 |
| concepts[13].level | 1 |
| concepts[13].score | 0.33885160088539124 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[13].display_name | Artificial intelligence |
| concepts[14].id | https://openalex.org/C45804977 |
| concepts[14].level | 2 |
| concepts[14].score | 0.32131820917129517 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q7239673 |
| concepts[14].display_name | Predictive modelling |
| concepts[15].id | https://openalex.org/C66024118 |
| concepts[15].level | 2 |
| concepts[15].score | 0.3185058832168579 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q1122506 |
| concepts[15].display_name | Computational model |
| concepts[16].id | https://openalex.org/C75684735 |
| concepts[16].level | 2 |
| concepts[16].score | 0.31374597549438477 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q858810 |
| concepts[16].display_name | Big data |
| concepts[17].id | https://openalex.org/C2780378061 |
| concepts[17].level | 2 |
| concepts[17].score | 0.2967299222946167 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q25351891 |
| concepts[17].display_name | Service (business) |
| concepts[18].id | https://openalex.org/C127964446 |
| concepts[18].level | 3 |
| concepts[18].score | 0.29065990447998047 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q1092142 |
| concepts[18].display_name | Computational resource |
| concepts[19].id | https://openalex.org/C2778755073 |
| concepts[19].level | 2 |
| concepts[19].score | 0.27934154868125916 |
| concepts[19].wikidata | https://www.wikidata.org/wiki/Q10858537 |
| concepts[19].display_name | Scale (ratio) |
| concepts[20].id | https://openalex.org/C63479239 |
| concepts[20].level | 3 |
| concepts[20].score | 0.273384153842926 |
| concepts[20].wikidata | https://www.wikidata.org/wiki/Q7353546 |
| concepts[20].display_name | Robustness (evolution) |
| concepts[21].id | https://openalex.org/C29202148 |
| concepts[21].level | 2 |
| concepts[21].score | 0.2553485333919525 |
| concepts[21].wikidata | https://www.wikidata.org/wiki/Q287260 |
| concepts[21].display_name | Resource allocation |
| keywords[0].id | https://openalex.org/keywords/orchestration |
| keywords[0].score | 0.7981753945350647 |
| keywords[0].display_name | Orchestration |
| keywords[1].id | https://openalex.org/keywords/scalability |
| keywords[1].score | 0.7179195284843445 |
| keywords[1].display_name | Scalability |
| keywords[2].id | https://openalex.org/keywords/inference |
| keywords[2].score | 0.6117295026779175 |
| keywords[2].display_name | Inference |
| keywords[3].id | https://openalex.org/keywords/security-token |
| keywords[3].score | 0.5075799226760864 |
| keywords[3].display_name | Security token |
| keywords[4].id | https://openalex.org/keywords/reliability |
| keywords[4].score | 0.41454461216926575 |
| keywords[4].display_name | Reliability (semiconductor) |
| keywords[5].id | https://openalex.org/keywords/latency |
| keywords[5].score | 0.411941796541214 |
| keywords[5].display_name | Latency (audio) |
| keywords[6].id | https://openalex.org/keywords/reinforcement-learning |
| keywords[6].score | 0.4007797837257385 |
| keywords[6].display_name | Reinforcement learning |
| keywords[7].id | https://openalex.org/keywords/resource |
| keywords[7].score | 0.3761606812477112 |
| keywords[7].display_name | Resource (disambiguation) |
| language | |
| locations[0].id | doi:10.5281/zenodo.17828172 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400562 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | Zenodo (CERN European Organization for Nuclear Research) |
| locations[0].source.host_organization | https://openalex.org/I67311998 |
| locations[0].source.host_organization_name | European Organization for Nuclear Research |
| locations[0].source.host_organization_lineage | https://openalex.org/I67311998 |
| locations[0].license | cc-by |
| locations[0].pdf_url | |
| locations[0].version | |
| locations[0].raw_type | article-journal |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | False |
| locations[0].is_published | |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | https://doi.org/10.5281/zenodo.17828172 |
| indexed_in | datacite |
| authorships[0].author.id | |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Revista, Zen |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Revista, Zen |
| authorships[0].is_corresponding | True |
| authorships[1].author.id | |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | IA, 10 |
| authorships[1].author_position | last |
| authorships[1].raw_author_name | IA, 10 |
| authorships[1].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://doi.org/10.5281/zenodo.17828172 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-12-06T00:00:00 |
| display_name | AI-Driven Predictive Load Orchestration for Distributed LLM Inference |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-12-06T23:14:57.273132 |
| primary_topic.id | https://openalex.org/T14347 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.16177503764629364 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1710 |
| primary_topic.subfield.display_name | Information Systems |
| primary_topic.display_name | Big Data and Digital Economy |
| cited_by_count | 0 |
| locations_count | 1 |
| best_oa_location.id | doi:10.5281/zenodo.17828172 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400562 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | Zenodo (CERN European Organization for Nuclear Research) |
| best_oa_location.source.host_organization | https://openalex.org/I67311998 |
| best_oa_location.source.host_organization_name | European Organization for Nuclear Research |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I67311998 |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | |
| best_oa_location.version | |
| best_oa_location.raw_type | article-journal |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | https://doi.org/10.5281/zenodo.17828172 |
| primary_location.id | doi:10.5281/zenodo.17828172 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400562 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | Zenodo (CERN European Organization for Nuclear Research) |
| primary_location.source.host_organization | https://openalex.org/I67311998 |
| primary_location.source.host_organization_name | European Organization for Nuclear Research |
| primary_location.source.host_organization_lineage | https://openalex.org/I67311998 |
| primary_location.license | cc-by |
| primary_location.pdf_url | |
| primary_location.version | |
| primary_location.raw_type | article-journal |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | https://doi.org/10.5281/zenodo.17828172 |
| publication_date | 2025-12-05 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 3, 100, 104, 122 |
| abstract_inverted_index.As | 20 |
| abstract_inverted_index.We | 102 |
| abstract_inverted_index.an | 158 |
| abstract_inverted_index.in | 23 |
| abstract_inverted_index.of | 56, 112, 156, 171 |
| abstract_inverted_index.to | 60, 89, 129, 164 |
| abstract_inverted_index.CPU | 136 |
| abstract_inverted_index.GPU | 134 |
| abstract_inverted_index.LLM | 57, 173 |
| abstract_inverted_index.Our | 68 |
| abstract_inverted_index.The | 145 |
| abstract_inverted_index.and | 25, 39, 52, 64, 82, 94, 116, 121, 135, 138, 153, 169 |
| abstract_inverted_index.for | 6, 13, 35, 79, 85 |
| abstract_inverted_index.its | 162 |
| abstract_inverted_index.low | 40 |
| abstract_inverted_index.the | 50, 148, 167 |
| abstract_inverted_index.LLMs | 21 |
| abstract_inverted_index.This | 0, 126 |
| abstract_inverted_index.aims | 128 |
| abstract_inverted_index.high | 37 |
| abstract_inverted_index.load | 9, 44 |
| abstract_inverted_index.size | 24 |
| abstract_inverted_index.such | 157 |
| abstract_inverted_index.them | 28 |
| abstract_inverted_index.with | 49 |
| abstract_inverted_index.(LLM) | 18 |
| abstract_inverted_index.Large | 15 |
| abstract_inverted_index.Model | 17 |
| abstract_inverted_index.loads | 93 |
| abstract_inverted_index.model | 117 |
| abstract_inverted_index.novel | 4 |
| abstract_inverted_index.often | 47 |
| abstract_inverted_index.paper | 1, 146 |
| abstract_inverted_index.rates | 115 |
| abstract_inverted_index.scale | 22 |
| abstract_inverted_index.token | 113 |
| abstract_inverted_index.under | 142 |
| abstract_inverted_index.across | 29, 99 |
| abstract_inverted_index.demand | 80 |
| abstract_inverted_index.detail | 103 |
| abstract_inverted_index.ensure | 139 |
| abstract_inverted_index.future | 91 |
| abstract_inverted_index.times. | 67 |
| abstract_inverted_index.becomes | 33 |
| abstract_inverted_index.demands | 55 |
| abstract_inverted_index.dynamic | 51, 86 |
| abstract_inverted_index.engine. | 125 |
| abstract_inverted_index.enhance | 166 |
| abstract_inverted_index.latency | 41 |
| abstract_inverted_index.leading | 59 |
| abstract_inverted_index.machine | 77 |
| abstract_inverted_index.meeting | 36 |
| abstract_inverted_index.predict | 90 |
| abstract_inverted_index.service | 140 |
| abstract_inverted_index.system, | 160 |
| abstract_inverted_index.Language | 16 |
| abstract_inverted_index.advanced | 72 |
| abstract_inverted_index.approach | 70 |
| abstract_inverted_index.benefits | 155 |
| abstract_inverted_index.cluster. | 101 |
| abstract_inverted_index.latency, | 132 |
| abstract_inverted_index.learning | 78, 84 |
| abstract_inverted_index.maximize | 133 |
| abstract_inverted_index.minimize | 130 |
| abstract_inverted_index.modeling | 111 |
| abstract_inverted_index.presents | 2 |
| abstract_inverted_index.proposed | 69 |
| abstract_inverted_index.resource | 62, 87 |
| abstract_inverted_index.response | 66 |
| abstract_inverted_index.specific | 118 |
| abstract_inverted_index.struggle | 48 |
| abstract_inverted_index.tailored | 12 |
| abstract_inverted_index.AI-driven | 7, 159 |
| abstract_inverted_index.balancing | 45 |
| abstract_inverted_index.computing | 31 |
| abstract_inverted_index.deploying | 27 |
| abstract_inverted_index.discusses | 147 |
| abstract_inverted_index.essential | 34 |
| abstract_inverted_index.framework | 5, 127 |
| abstract_inverted_index.including | 76 |
| abstract_inverted_index.increased | 65 |
| abstract_inverted_index.inference | 92, 131 |
| abstract_inverted_index.leverages | 71 |
| abstract_inverted_index.potential | 154, 163 |
| abstract_inverted_index.real-time | 107 |
| abstract_inverted_index.resources | 98 |
| abstract_inverted_index.telemetry | 108 |
| abstract_inverted_index.artificial | 73 |
| abstract_inverted_index.efficiency | 168 |
| abstract_inverted_index.generation | 114 |
| abstract_inverted_index.inference, | 58 |
| abstract_inverted_index.inference. | 19 |
| abstract_inverted_index.predictive | 8, 110 |
| abstract_inverted_index.suboptimal | 61 |
| abstract_inverted_index.techniques | 46 |
| abstract_inverted_index.throughput | 38 |
| abstract_inverted_index.workloads. | 144 |
| abstract_inverted_index.Traditional | 43 |
| abstract_inverted_index.algorithmic | 151 |
| abstract_inverted_index.allocation, | 88 |
| abstract_inverted_index.collection, | 109 |
| abstract_inverted_index.complexity, | 26 |
| abstract_inverted_index.components, | 150 |
| abstract_inverted_index.distributed | 14, 30 |
| abstract_inverted_index.fluctuating | 143 |
| abstract_inverted_index.forecasting | 81 |
| abstract_inverted_index.large-scale | 172 |
| abstract_inverted_index.methodology | 105 |
| abstract_inverted_index.orchestrate | 96 |
| abstract_inverted_index.reliability | 141 |
| abstract_inverted_index.scalability | 170 |
| abstract_inverted_index.techniques, | 75 |
| abstract_inverted_index.utilization | 63 |
| abstract_inverted_index.deployments. | 174 |
| abstract_inverted_index.encompassing | 106 |
| abstract_inverted_index.environments | 32 |
| abstract_inverted_index.highlighting | 161 |
| abstract_inverted_index.intelligence | 74 |
| abstract_inverted_index.specifically | 11 |
| abstract_inverted_index.utilization, | 137 |
| abstract_inverted_index.architectural | 149 |
| abstract_inverted_index.computational | 54, 97, 119 |
| abstract_inverted_index.heterogeneous | 53 |
| abstract_inverted_index.intelligently | 95 |
| abstract_inverted_index.orchestration | 10, 124 |
| abstract_inverted_index.policy-driven | 123 |
| abstract_inverted_index.reinforcement | 83 |
| abstract_inverted_index.requirements, | 120 |
| abstract_inverted_index.requirements. | 42 |
| abstract_inverted_index.significantly | 165 |
| abstract_inverted_index.considerations, | 152 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 2 |
| citation_normalized_percentile.value | 0.9092483 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | True |