Collaborative Inference for Large Models with Task Offloading and Early Exiting Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2412.08284
In 5G smart cities, edge computing is employed to provide nearby computing services for end devices, and the large-scale models (e.g., GPT and LLaMA) can be deployed at the network edge to boost the service quality. However, due to the constraints of memory size and computing capacity, it is difficult to run these large-scale models on a single edge node. To meet the resource constraints, a large-scale model can be partitioned into multiple sub-models and deployed across multiple edge nodes. Then tasks are offloaded to the edge nodes for collaborative inference. Additionally, we incorporate the early exit mechanism to further accelerate inference. However, the heterogeneous system and dynamic environment will significantly affect the inference efficiency. To address these challenges, we theoretically analyze the coupled relationship between task offloading strategy and confidence thresholds, and develop a distributed algorithm, termed DTO-EE, based on the coupled relationship and convex optimization. DTO-EE enables each edge node to jointly optimize its offloading strategy and the confidence threshold, so as to achieve a promising trade-off between response delay and inference accuracy. The experimental results show that DTO-EE can reduce the average response delay by 21%-41% and improve the inference accuracy by 1%-4%, compared to the baselines.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2412.08284
- https://arxiv.org/pdf/2412.08284
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4405301319
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4405301319Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2412.08284Digital Object Identifier
- Title
-
Collaborative Inference for Large Models with Task Offloading and Early ExitingWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-12-11Full publication date if available
- Authors
-
Z.S. Xie, Xu Yang, Hongli Xu, Yunming Liao, Zhiyuan YaoList of authors in order
- Landing page
-
https://arxiv.org/abs/2412.08284Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2412.08284Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2412.08284Direct OA link when available
- Concepts
-
Inference, Computer science, Task (project management), Human–computer interaction, Artificial intelligence, Engineering, Systems engineeringTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4405301319 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2412.08284 |
| ids.doi | https://doi.org/10.48550/arxiv.2412.08284 |
| ids.openalex | https://openalex.org/W4405301319 |
| fwci | |
| type | preprint |
| title | Collaborative Inference for Large Models with Task Offloading and Early Exiting |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10715 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9821000099182129 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1705 |
| topics[0].subfield.display_name | Computer Networks and Communications |
| topics[0].display_name | Distributed and Parallel Computing Systems |
| topics[1].id | https://openalex.org/T13553 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.98089998960495 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1705 |
| topics[1].subfield.display_name | Computer Networks and Communications |
| topics[1].display_name | Age of Information Optimization |
| topics[2].id | https://openalex.org/T11986 |
| topics[2].field.id | https://openalex.org/fields/18 |
| topics[2].field.display_name | Decision Sciences |
| topics[2].score | 0.9783999919891357 |
| topics[2].domain.id | https://openalex.org/domains/2 |
| topics[2].domain.display_name | Social Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1802 |
| topics[2].subfield.display_name | Information Systems and Management |
| topics[2].display_name | Scientific Computing and Data Management |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2776214188 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7208348512649536 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q408386 |
| concepts[0].display_name | Inference |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.6718129515647888 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C2780451532 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6543973088264465 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q759676 |
| concepts[2].display_name | Task (project management) |
| concepts[3].id | https://openalex.org/C107457646 |
| concepts[3].level | 1 |
| concepts[3].score | 0.3643531799316406 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q207434 |
| concepts[3].display_name | Human–computer interaction |
| concepts[4].id | https://openalex.org/C154945302 |
| concepts[4].level | 1 |
| concepts[4].score | 0.24254992604255676 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[4].display_name | Artificial intelligence |
| concepts[5].id | https://openalex.org/C127413603 |
| concepts[5].level | 0 |
| concepts[5].score | 0.12096428871154785 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q11023 |
| concepts[5].display_name | Engineering |
| concepts[6].id | https://openalex.org/C201995342 |
| concepts[6].level | 1 |
| concepts[6].score | 0.09825330972671509 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q682496 |
| concepts[6].display_name | Systems engineering |
| keywords[0].id | https://openalex.org/keywords/inference |
| keywords[0].score | 0.7208348512649536 |
| keywords[0].display_name | Inference |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.6718129515647888 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/task |
| keywords[2].score | 0.6543973088264465 |
| keywords[2].display_name | Task (project management) |
| keywords[3].id | https://openalex.org/keywords/human–computer-interaction |
| keywords[3].score | 0.3643531799316406 |
| keywords[3].display_name | Human–computer interaction |
| keywords[4].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[4].score | 0.24254992604255676 |
| keywords[4].display_name | Artificial intelligence |
| keywords[5].id | https://openalex.org/keywords/engineering |
| keywords[5].score | 0.12096428871154785 |
| keywords[5].display_name | Engineering |
| keywords[6].id | https://openalex.org/keywords/systems-engineering |
| keywords[6].score | 0.09825330972671509 |
| keywords[6].display_name | Systems engineering |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2412.08284 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2412.08284 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2412.08284 |
| locations[1].id | doi:10.48550/arxiv.2412.08284 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2412.08284 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5113418862 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Z.S. Xie |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Xie, Zuan |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5007963696 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-0553-4581 |
| authorships[1].author.display_name | Xu Yang |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Xu, Yang |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5063184427 |
| authorships[2].author.orcid | https://orcid.org/0000-0003-3831-4577 |
| authorships[2].author.display_name | Hongli Xu |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Xu, Hongli |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5062964635 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-5065-2600 |
| authorships[3].author.display_name | Yunming Liao |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Liao, Yunming |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5006328558 |
| authorships[4].author.orcid | https://orcid.org/0000-0001-9671-4208 |
| authorships[4].author.display_name | Zhiyuan Yao |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Yao, Zhiyuan |
| authorships[4].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2412.08284 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Collaborative Inference for Large Models with Task Offloading and Early Exiting |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10715 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9821000099182129 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1705 |
| primary_topic.subfield.display_name | Computer Networks and Communications |
| primary_topic.display_name | Distributed and Parallel Computing Systems |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2899084033, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W4391913857, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W3196817267, https://openalex.org/W1976600725 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2412.08284 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2412.08284 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2412.08284 |
| primary_location.id | pmh:oai:arXiv.org:2412.08284 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2412.08284 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2412.08284 |
| publication_date | 2024-12-11 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 56, 65, 134, 166 |
| abstract_inverted_index.5G | 1 |
| abstract_inverted_index.In | 0 |
| abstract_inverted_index.To | 60, 115 |
| abstract_inverted_index.as | 163 |
| abstract_inverted_index.at | 27 |
| abstract_inverted_index.be | 25, 69 |
| abstract_inverted_index.by | 187, 194 |
| abstract_inverted_index.is | 6, 48 |
| abstract_inverted_index.it | 47 |
| abstract_inverted_index.of | 41 |
| abstract_inverted_index.on | 55, 140 |
| abstract_inverted_index.so | 162 |
| abstract_inverted_index.to | 8, 31, 38, 50, 84, 98, 152, 164, 197 |
| abstract_inverted_index.we | 92, 119 |
| abstract_inverted_index.GPT | 21 |
| abstract_inverted_index.The | 175 |
| abstract_inverted_index.and | 16, 22, 44, 74, 106, 129, 132, 144, 158, 172, 189 |
| abstract_inverted_index.are | 82 |
| abstract_inverted_index.can | 24, 68, 181 |
| abstract_inverted_index.due | 37 |
| abstract_inverted_index.end | 14 |
| abstract_inverted_index.for | 13, 88 |
| abstract_inverted_index.its | 155 |
| abstract_inverted_index.run | 51 |
| abstract_inverted_index.the | 17, 28, 33, 39, 62, 85, 94, 103, 112, 122, 141, 159, 183, 191, 198 |
| abstract_inverted_index.Then | 80 |
| abstract_inverted_index.each | 149 |
| abstract_inverted_index.edge | 4, 30, 58, 78, 86, 150 |
| abstract_inverted_index.exit | 96 |
| abstract_inverted_index.into | 71 |
| abstract_inverted_index.meet | 61 |
| abstract_inverted_index.node | 151 |
| abstract_inverted_index.show | 178 |
| abstract_inverted_index.size | 43 |
| abstract_inverted_index.task | 126 |
| abstract_inverted_index.that | 179 |
| abstract_inverted_index.will | 109 |
| abstract_inverted_index.based | 139 |
| abstract_inverted_index.boost | 32 |
| abstract_inverted_index.delay | 171, 186 |
| abstract_inverted_index.early | 95 |
| abstract_inverted_index.model | 67 |
| abstract_inverted_index.node. | 59 |
| abstract_inverted_index.nodes | 87 |
| abstract_inverted_index.smart | 2 |
| abstract_inverted_index.tasks | 81 |
| abstract_inverted_index.these | 52, 117 |
| abstract_inverted_index.(e.g., | 20 |
| abstract_inverted_index.1%-4%, | 195 |
| abstract_inverted_index.DTO-EE | 147, 180 |
| abstract_inverted_index.LLaMA) | 23 |
| abstract_inverted_index.across | 76 |
| abstract_inverted_index.affect | 111 |
| abstract_inverted_index.convex | 145 |
| abstract_inverted_index.memory | 42 |
| abstract_inverted_index.models | 19, 54 |
| abstract_inverted_index.nearby | 10 |
| abstract_inverted_index.nodes. | 79 |
| abstract_inverted_index.reduce | 182 |
| abstract_inverted_index.single | 57 |
| abstract_inverted_index.system | 105 |
| abstract_inverted_index.termed | 137 |
| abstract_inverted_index.21%-41% | 188 |
| abstract_inverted_index.DTO-EE, | 138 |
| abstract_inverted_index.achieve | 165 |
| abstract_inverted_index.address | 116 |
| abstract_inverted_index.analyze | 121 |
| abstract_inverted_index.average | 184 |
| abstract_inverted_index.between | 125, 169 |
| abstract_inverted_index.cities, | 3 |
| abstract_inverted_index.coupled | 123, 142 |
| abstract_inverted_index.develop | 133 |
| abstract_inverted_index.dynamic | 107 |
| abstract_inverted_index.enables | 148 |
| abstract_inverted_index.further | 99 |
| abstract_inverted_index.improve | 190 |
| abstract_inverted_index.jointly | 153 |
| abstract_inverted_index.network | 29 |
| abstract_inverted_index.provide | 9 |
| abstract_inverted_index.results | 177 |
| abstract_inverted_index.service | 34 |
| abstract_inverted_index.However, | 36, 102 |
| abstract_inverted_index.accuracy | 193 |
| abstract_inverted_index.compared | 196 |
| abstract_inverted_index.deployed | 26, 75 |
| abstract_inverted_index.devices, | 15 |
| abstract_inverted_index.employed | 7 |
| abstract_inverted_index.multiple | 72, 77 |
| abstract_inverted_index.optimize | 154 |
| abstract_inverted_index.quality. | 35 |
| abstract_inverted_index.resource | 63 |
| abstract_inverted_index.response | 170, 185 |
| abstract_inverted_index.services | 12 |
| abstract_inverted_index.strategy | 128, 157 |
| abstract_inverted_index.accuracy. | 174 |
| abstract_inverted_index.capacity, | 46 |
| abstract_inverted_index.computing | 5, 11, 45 |
| abstract_inverted_index.difficult | 49 |
| abstract_inverted_index.inference | 113, 173, 192 |
| abstract_inverted_index.mechanism | 97 |
| abstract_inverted_index.offloaded | 83 |
| abstract_inverted_index.promising | 167 |
| abstract_inverted_index.trade-off | 168 |
| abstract_inverted_index.accelerate | 100 |
| abstract_inverted_index.algorithm, | 136 |
| abstract_inverted_index.baselines. | 199 |
| abstract_inverted_index.confidence | 130, 160 |
| abstract_inverted_index.inference. | 90, 101 |
| abstract_inverted_index.offloading | 127, 156 |
| abstract_inverted_index.sub-models | 73 |
| abstract_inverted_index.threshold, | 161 |
| abstract_inverted_index.challenges, | 118 |
| abstract_inverted_index.constraints | 40 |
| abstract_inverted_index.distributed | 135 |
| abstract_inverted_index.efficiency. | 114 |
| abstract_inverted_index.environment | 108 |
| abstract_inverted_index.incorporate | 93 |
| abstract_inverted_index.large-scale | 18, 53, 66 |
| abstract_inverted_index.partitioned | 70 |
| abstract_inverted_index.thresholds, | 131 |
| abstract_inverted_index.constraints, | 64 |
| abstract_inverted_index.experimental | 176 |
| abstract_inverted_index.relationship | 124, 143 |
| abstract_inverted_index.Additionally, | 91 |
| abstract_inverted_index.collaborative | 89 |
| abstract_inverted_index.heterogeneous | 104 |
| abstract_inverted_index.optimization. | 146 |
| abstract_inverted_index.significantly | 110 |
| abstract_inverted_index.theoretically | 120 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile |