Beyond Containers: A Serverless and Adaptive Framework for High-Throughput Model Serving Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.5281/zenodo.17818571
This paper introduces a novel framework for high-throughput model serving that moves beyond traditional container-based deployments. We propose a serverless and adaptive architecture that dynamically scales resources based on real-time demand, optimizing for both latency and cost efficiency. Our framework leverages function-as-a-service (FaaS) platforms to provide fine-grained resource allocation and auto-scaling capabilities. Furthermore, we introduce an adaptive routing mechanism that intelligently distributes requests across available function instances, taking into account factors such as instance load and network proximity. We evaluate the performance of our framework through extensive experiments using a variety of machine learning models and realistic workload scenarios. The results demonstrate that our approach significantly outperforms container-based deployments in terms of throughput, latency, and resource utilization, particularly under highly variable load conditions. We also present a cost analysis, showing that our serverless framework can achieve substantial cost savings compared to traditional methods. This research contributes to the growing field of serverless machine learning and provides a practical solution for deploying and scaling machine learning models in production environments.
Related Topics
- Type
- article
- Landing Page
- https://doi.org/10.5281/zenodo.17818571
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W7108625119
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W7108625119Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.5281/zenodo.17818571Digital Object Identifier
- Title
-
Beyond Containers: A Serverless and Adaptive Framework for High-Throughput Model ServingWork title
- Type
-
articleOpenAlex work type
- Publication year
-
2025Year of publication
- Publication date
-
2025-12-04Full publication date if available
- Authors
-
Revista, Zen, IA, 10List of authors in order
- Landing page
-
https://doi.org/10.5281/zenodo.17818571Publisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://doi.org/10.5281/zenodo.17818571Direct OA link when available
- Concepts
-
Computer science, Workload, Distributed computing, Variety (cybernetics), Field (mathematics), Resource (disambiguation), Resource allocation, Latency (audio), Function (biology), Artificial intelligence, Scheduling (production processes), Routing (electronic design automation), Machine learning, Resource management (computing), Key (lock), Load balancing (electrical power), Adaptive system, Activity-based costing, Resource efficiencyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W7108625119 |
|---|---|
| doi | https://doi.org/10.5281/zenodo.17818571 |
| ids.doi | https://doi.org/10.5281/zenodo.17818571 |
| ids.openalex | https://openalex.org/W7108625119 |
| fwci | 0.0 |
| type | article |
| title | Beyond Containers: A Serverless and Adaptive Framework for High-Throughput Model Serving |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10101 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.407719224691391 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1710 |
| topics[0].subfield.display_name | Information Systems |
| topics[0].display_name | Cloud Computing and Resource Management |
| topics[1].id | https://openalex.org/T12127 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.3787726163864136 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1705 |
| topics[1].subfield.display_name | Computer Networks and Communications |
| topics[1].display_name | Software System Performance and Reliability |
| topics[2].id | https://openalex.org/T10273 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.03485862910747528 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1705 |
| topics[2].subfield.display_name | Computer Networks and Communications |
| topics[2].display_name | IoT and Edge/Fog Computing |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.829902708530426 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C2778476105 |
| concepts[1].level | 2 |
| concepts[1].score | 0.5843750834465027 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q628539 |
| concepts[1].display_name | Workload |
| concepts[2].id | https://openalex.org/C120314980 |
| concepts[2].level | 1 |
| concepts[2].score | 0.5168944001197815 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q180634 |
| concepts[2].display_name | Distributed computing |
| concepts[3].id | https://openalex.org/C136197465 |
| concepts[3].level | 2 |
| concepts[3].score | 0.44933056831359863 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q1729295 |
| concepts[3].display_name | Variety (cybernetics) |
| concepts[4].id | https://openalex.org/C9652623 |
| concepts[4].level | 2 |
| concepts[4].score | 0.44665542244911194 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q190109 |
| concepts[4].display_name | Field (mathematics) |
| concepts[5].id | https://openalex.org/C206345919 |
| concepts[5].level | 2 |
| concepts[5].score | 0.43106576800346375 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q20380951 |
| concepts[5].display_name | Resource (disambiguation) |
| concepts[6].id | https://openalex.org/C29202148 |
| concepts[6].level | 2 |
| concepts[6].score | 0.4252927601337433 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q287260 |
| concepts[6].display_name | Resource allocation |
| concepts[7].id | https://openalex.org/C82876162 |
| concepts[7].level | 2 |
| concepts[7].score | 0.425041526556015 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q17096504 |
| concepts[7].display_name | Latency (audio) |
| concepts[8].id | https://openalex.org/C14036430 |
| concepts[8].level | 2 |
| concepts[8].score | 0.3975290060043335 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q3736076 |
| concepts[8].display_name | Function (biology) |
| concepts[9].id | https://openalex.org/C154945302 |
| concepts[9].level | 1 |
| concepts[9].score | 0.38026663661003113 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[9].display_name | Artificial intelligence |
| concepts[10].id | https://openalex.org/C206729178 |
| concepts[10].level | 2 |
| concepts[10].score | 0.34069401025772095 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q2271896 |
| concepts[10].display_name | Scheduling (production processes) |
| concepts[11].id | https://openalex.org/C74172769 |
| concepts[11].level | 2 |
| concepts[11].score | 0.3390486240386963 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q1446839 |
| concepts[11].display_name | Routing (electronic design automation) |
| concepts[12].id | https://openalex.org/C119857082 |
| concepts[12].level | 1 |
| concepts[12].score | 0.3321021795272827 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[12].display_name | Machine learning |
| concepts[13].id | https://openalex.org/C2780609101 |
| concepts[13].level | 2 |
| concepts[13].score | 0.29119715094566345 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q17156588 |
| concepts[13].display_name | Resource management (computing) |
| concepts[14].id | https://openalex.org/C26517878 |
| concepts[14].level | 2 |
| concepts[14].score | 0.2890984117984772 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q228039 |
| concepts[14].display_name | Key (lock) |
| concepts[15].id | https://openalex.org/C138959212 |
| concepts[15].level | 3 |
| concepts[15].score | 0.2881604731082916 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q1806783 |
| concepts[15].display_name | Load balancing (electrical power) |
| concepts[16].id | https://openalex.org/C52970973 |
| concepts[16].level | 2 |
| concepts[16].score | 0.2779183089733124 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q2497134 |
| concepts[16].display_name | Adaptive system |
| concepts[17].id | https://openalex.org/C164624739 |
| concepts[17].level | 2 |
| concepts[17].score | 0.2708047330379486 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q754331 |
| concepts[17].display_name | Activity-based costing |
| concepts[18].id | https://openalex.org/C2777958785 |
| concepts[18].level | 2 |
| concepts[18].score | 0.25214314460754395 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q17120940 |
| concepts[18].display_name | Resource efficiency |
| keywords[0].id | https://openalex.org/keywords/workload |
| keywords[0].score | 0.5843750834465027 |
| keywords[0].display_name | Workload |
| keywords[1].id | https://openalex.org/keywords/variety |
| keywords[1].score | 0.44933056831359863 |
| keywords[1].display_name | Variety (cybernetics) |
| keywords[2].id | https://openalex.org/keywords/field |
| keywords[2].score | 0.44665542244911194 |
| keywords[2].display_name | Field (mathematics) |
| keywords[3].id | https://openalex.org/keywords/resource |
| keywords[3].score | 0.43106576800346375 |
| keywords[3].display_name | Resource (disambiguation) |
| keywords[4].id | https://openalex.org/keywords/resource-allocation |
| keywords[4].score | 0.4252927601337433 |
| keywords[4].display_name | Resource allocation |
| keywords[5].id | https://openalex.org/keywords/latency |
| keywords[5].score | 0.425041526556015 |
| keywords[5].display_name | Latency (audio) |
| keywords[6].id | https://openalex.org/keywords/function |
| keywords[6].score | 0.3975290060043335 |
| keywords[6].display_name | Function (biology) |
| language | |
| locations[0].id | doi:10.5281/zenodo.17818571 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400562 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | Zenodo (CERN European Organization for Nuclear Research) |
| locations[0].source.host_organization | https://openalex.org/I67311998 |
| locations[0].source.host_organization_name | European Organization for Nuclear Research |
| locations[0].source.host_organization_lineage | https://openalex.org/I67311998 |
| locations[0].license | cc-by |
| locations[0].pdf_url | |
| locations[0].version | |
| locations[0].raw_type | article-journal |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | False |
| locations[0].is_published | |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | https://doi.org/10.5281/zenodo.17818571 |
| indexed_in | datacite |
| authorships[0].author.id | |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Revista, Zen |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Revista, Zen |
| authorships[0].is_corresponding | True |
| authorships[1].author.id | |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | IA, 10 |
| authorships[1].author_position | last |
| authorships[1].raw_author_name | IA, 10 |
| authorships[1].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://doi.org/10.5281/zenodo.17818571 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-12-05T00:00:00 |
| display_name | Beyond Containers: A Serverless and Adaptive Framework for High-Throughput Model Serving |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-12-05T23:25:22.460635 |
| primary_topic.id | https://openalex.org/T10101 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.407719224691391 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1710 |
| primary_topic.subfield.display_name | Information Systems |
| primary_topic.display_name | Cloud Computing and Resource Management |
| cited_by_count | 0 |
| locations_count | 1 |
| best_oa_location.id | doi:10.5281/zenodo.17818571 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400562 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | Zenodo (CERN European Organization for Nuclear Research) |
| best_oa_location.source.host_organization | https://openalex.org/I67311998 |
| best_oa_location.source.host_organization_name | European Organization for Nuclear Research |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I67311998 |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | |
| best_oa_location.version | |
| best_oa_location.raw_type | article-journal |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | https://doi.org/10.5281/zenodo.17818571 |
| primary_location.id | doi:10.5281/zenodo.17818571 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400562 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | Zenodo (CERN European Organization for Nuclear Research) |
| primary_location.source.host_organization | https://openalex.org/I67311998 |
| primary_location.source.host_organization_name | European Organization for Nuclear Research |
| primary_location.source.host_organization_lineage | https://openalex.org/I67311998 |
| primary_location.license | cc-by |
| primary_location.pdf_url | |
| primary_location.version | |
| primary_location.raw_type | article-journal |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | https://doi.org/10.5281/zenodo.17818571 |
| publication_date | 2025-12-04 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 3, 18, 89, 126, 156 |
| abstract_inverted_index.We | 16, 78, 123 |
| abstract_inverted_index.an | 55 |
| abstract_inverted_index.as | 72 |
| abstract_inverted_index.in | 109, 166 |
| abstract_inverted_index.of | 82, 91, 111, 150 |
| abstract_inverted_index.on | 28 |
| abstract_inverted_index.to | 44, 140, 146 |
| abstract_inverted_index.we | 53 |
| abstract_inverted_index.Our | 38 |
| abstract_inverted_index.The | 99 |
| abstract_inverted_index.and | 20, 35, 49, 75, 95, 114, 154, 161 |
| abstract_inverted_index.can | 134 |
| abstract_inverted_index.for | 6, 32, 159 |
| abstract_inverted_index.our | 83, 103, 131 |
| abstract_inverted_index.the | 80, 147 |
| abstract_inverted_index.This | 0, 143 |
| abstract_inverted_index.also | 124 |
| abstract_inverted_index.both | 33 |
| abstract_inverted_index.cost | 36, 127, 137 |
| abstract_inverted_index.into | 68 |
| abstract_inverted_index.load | 74, 121 |
| abstract_inverted_index.such | 71 |
| abstract_inverted_index.that | 10, 23, 59, 102, 130 |
| abstract_inverted_index.based | 27 |
| abstract_inverted_index.field | 149 |
| abstract_inverted_index.model | 8 |
| abstract_inverted_index.moves | 11 |
| abstract_inverted_index.novel | 4 |
| abstract_inverted_index.paper | 1 |
| abstract_inverted_index.terms | 110 |
| abstract_inverted_index.under | 118 |
| abstract_inverted_index.using | 88 |
| abstract_inverted_index.(FaaS) | 42 |
| abstract_inverted_index.across | 63 |
| abstract_inverted_index.beyond | 12 |
| abstract_inverted_index.highly | 119 |
| abstract_inverted_index.models | 94, 165 |
| abstract_inverted_index.scales | 25 |
| abstract_inverted_index.taking | 67 |
| abstract_inverted_index.account | 69 |
| abstract_inverted_index.achieve | 135 |
| abstract_inverted_index.demand, | 30 |
| abstract_inverted_index.factors | 70 |
| abstract_inverted_index.growing | 148 |
| abstract_inverted_index.latency | 34 |
| abstract_inverted_index.machine | 92, 152, 163 |
| abstract_inverted_index.network | 76 |
| abstract_inverted_index.present | 125 |
| abstract_inverted_index.propose | 17 |
| abstract_inverted_index.provide | 45 |
| abstract_inverted_index.results | 100 |
| abstract_inverted_index.routing | 57 |
| abstract_inverted_index.savings | 138 |
| abstract_inverted_index.scaling | 162 |
| abstract_inverted_index.serving | 9 |
| abstract_inverted_index.showing | 129 |
| abstract_inverted_index.through | 85 |
| abstract_inverted_index.variety | 90 |
| abstract_inverted_index.adaptive | 21, 56 |
| abstract_inverted_index.approach | 104 |
| abstract_inverted_index.compared | 139 |
| abstract_inverted_index.evaluate | 79 |
| abstract_inverted_index.function | 65 |
| abstract_inverted_index.instance | 73 |
| abstract_inverted_index.latency, | 113 |
| abstract_inverted_index.learning | 93, 153, 164 |
| abstract_inverted_index.methods. | 142 |
| abstract_inverted_index.provides | 155 |
| abstract_inverted_index.requests | 62 |
| abstract_inverted_index.research | 144 |
| abstract_inverted_index.resource | 47, 115 |
| abstract_inverted_index.solution | 158 |
| abstract_inverted_index.variable | 120 |
| abstract_inverted_index.workload | 97 |
| abstract_inverted_index.analysis, | 128 |
| abstract_inverted_index.available | 64 |
| abstract_inverted_index.deploying | 160 |
| abstract_inverted_index.extensive | 86 |
| abstract_inverted_index.framework | 5, 39, 84, 133 |
| abstract_inverted_index.introduce | 54 |
| abstract_inverted_index.leverages | 40 |
| abstract_inverted_index.mechanism | 58 |
| abstract_inverted_index.platforms | 43 |
| abstract_inverted_index.practical | 157 |
| abstract_inverted_index.real-time | 29 |
| abstract_inverted_index.realistic | 96 |
| abstract_inverted_index.resources | 26 |
| abstract_inverted_index.allocation | 48 |
| abstract_inverted_index.instances, | 66 |
| abstract_inverted_index.introduces | 2 |
| abstract_inverted_index.optimizing | 31 |
| abstract_inverted_index.production | 167 |
| abstract_inverted_index.proximity. | 77 |
| abstract_inverted_index.scenarios. | 98 |
| abstract_inverted_index.serverless | 19, 132, 151 |
| abstract_inverted_index.conditions. | 122 |
| abstract_inverted_index.contributes | 145 |
| abstract_inverted_index.demonstrate | 101 |
| abstract_inverted_index.deployments | 108 |
| abstract_inverted_index.distributes | 61 |
| abstract_inverted_index.dynamically | 24 |
| abstract_inverted_index.efficiency. | 37 |
| abstract_inverted_index.experiments | 87 |
| abstract_inverted_index.outperforms | 106 |
| abstract_inverted_index.performance | 81 |
| abstract_inverted_index.substantial | 136 |
| abstract_inverted_index.throughput, | 112 |
| abstract_inverted_index.traditional | 13, 141 |
| abstract_inverted_index.Furthermore, | 52 |
| abstract_inverted_index.architecture | 22 |
| abstract_inverted_index.auto-scaling | 50 |
| abstract_inverted_index.deployments. | 15 |
| abstract_inverted_index.fine-grained | 46 |
| abstract_inverted_index.particularly | 117 |
| abstract_inverted_index.utilization, | 116 |
| abstract_inverted_index.capabilities. | 51 |
| abstract_inverted_index.environments. | 168 |
| abstract_inverted_index.intelligently | 60 |
| abstract_inverted_index.significantly | 105 |
| abstract_inverted_index.container-based | 14, 107 |
| abstract_inverted_index.high-throughput | 7 |
| abstract_inverted_index.function-as-a-service | 41 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 2 |
| citation_normalized_percentile.value | 0.90875927 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | True |