Model Cascading for Code: A Cascaded Black-Box Multi-Model Framework for Cost-Efficient Code Completion with Self-Testing Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2405.15842
The rapid advancement of large language models (LLMs) has significantly improved code completion tasks, yet the trade-off between accuracy and computational cost remains a critical challenge. While using larger models and incorporating inference-time self-testing algorithms can significantly improve output accuracy, they incur substantial computational expenses at the same time. Furthermore, servers in real-world scenarios usually have a dynamic preference on the cost-accuracy tradeoff, depending on the budget, bandwidth, the concurrent user volume, and users' sensitivity to wrong answers. In this work, we introduce a novel framework combining model cascading and inference-time self-feedback algorithms to find multiple near-optimal self-testing options on the cost-accuracy tradeoff in LLM-based code generation. Our approach leverages self-generated tests to both enhance accuracy and evaluate model cascading decisions. As a blackbox inference-time method, it requires no access to internal model parameters. We further propose a threshold-based algorithm to determine when to deploy larger models and a heuristic to optimize the number of solutions, test cases, and test lines generated per model, based on budget constraints. Experimental results show that our cascading approach reduces costs by an average of 26%, and up to 70% in the best case, across various model families and datasets, while maintaining or improving accuracy in natural language generation tasks compared to both random and optimal single-model self-testing schemes. To our knowledge, this is the first work to provide a series of choices for optimizing the cost-accuracy trade-off in LLM code generation with self-testing.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2405.15842
- https://arxiv.org/pdf/2405.15842
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4399114839
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4399114839Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2405.15842Digital Object Identifier
- Title
-
Model Cascading for Code: A Cascaded Black-Box Multi-Model Framework for Cost-Efficient Code Completion with Self-TestingWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-05-24Full publication date if available
- Authors
-
Boyuan Chen, Mingzhi Zhu, Brendan Dolan-Gavitt, Muhammad Shafique, Siddharth GargList of authors in order
- Landing page
-
https://arxiv.org/abs/2405.15842Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2405.15842Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2405.15842Direct OA link when available
- Concepts
-
Code (set theory), Computer science, Inference, Programming language, Artificial intelligence, Set (abstract data type)Top concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4399114839 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2405.15842 |
| ids.doi | https://doi.org/10.48550/arxiv.2405.15842 |
| ids.openalex | https://openalex.org/W4399114839 |
| fwci | |
| type | preprint |
| title | Model Cascading for Code: A Cascaded Black-Box Multi-Model Framework for Cost-Efficient Code Completion with Self-Testing |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10181 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9693999886512756 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Natural Language Processing Techniques |
| topics[1].id | https://openalex.org/T11450 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9384999871253967 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1712 |
| topics[1].subfield.display_name | Software |
| topics[1].display_name | Model-Driven Software Engineering Techniques |
| topics[2].id | https://openalex.org/T10260 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9153000116348267 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1710 |
| topics[2].subfield.display_name | Information Systems |
| topics[2].display_name | Software Engineering Research |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2776760102 |
| concepts[0].level | 3 |
| concepts[0].score | 0.6917015314102173 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q5139990 |
| concepts[0].display_name | Code (set theory) |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.6009762287139893 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C2776214188 |
| concepts[2].level | 2 |
| concepts[2].score | 0.5974776148796082 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q408386 |
| concepts[2].display_name | Inference |
| concepts[3].id | https://openalex.org/C199360897 |
| concepts[3].level | 1 |
| concepts[3].score | 0.4665752351284027 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[3].display_name | Programming language |
| concepts[4].id | https://openalex.org/C154945302 |
| concepts[4].level | 1 |
| concepts[4].score | 0.12733551859855652 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[4].display_name | Artificial intelligence |
| concepts[5].id | https://openalex.org/C177264268 |
| concepts[5].level | 2 |
| concepts[5].score | 0.054772377014160156 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q1514741 |
| concepts[5].display_name | Set (abstract data type) |
| keywords[0].id | https://openalex.org/keywords/code |
| keywords[0].score | 0.6917015314102173 |
| keywords[0].display_name | Code (set theory) |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.6009762287139893 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/inference |
| keywords[2].score | 0.5974776148796082 |
| keywords[2].display_name | Inference |
| keywords[3].id | https://openalex.org/keywords/programming-language |
| keywords[3].score | 0.4665752351284027 |
| keywords[3].display_name | Programming language |
| keywords[4].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[4].score | 0.12733551859855652 |
| keywords[4].display_name | Artificial intelligence |
| keywords[5].id | https://openalex.org/keywords/set |
| keywords[5].score | 0.054772377014160156 |
| keywords[5].display_name | Set (abstract data type) |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2405.15842 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2405.15842 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2405.15842 |
| locations[1].id | doi:10.48550/arxiv.2405.15842 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2405.15842 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5102924114 |
| authorships[0].author.orcid | https://orcid.org/0009-0006-9645-4526 |
| authorships[0].author.display_name | Boyuan Chen |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Chen, Boyuan |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5066681834 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Mingzhi Zhu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Zhu, Mingzhi |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5060815601 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-8867-4282 |
| authorships[2].author.display_name | Brendan Dolan-Gavitt |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Dolan-Gavitt, Brendan |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5005190949 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-2607-8135 |
| authorships[3].author.display_name | Muhammad Shafique |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Shafique, Muhammad |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5010950688 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-6158-9512 |
| authorships[4].author.display_name | Siddharth Garg |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Garg, Siddharth |
| authorships[4].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2405.15842 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2024-05-29T00:00:00 |
| display_name | Model Cascading for Code: A Cascaded Black-Box Multi-Model Framework for Cost-Efficient Code Completion with Self-Testing |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10181 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9693999886512756 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Natural Language Processing Techniques |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W4396696052, https://openalex.org/W2382290278, https://openalex.org/W4395014643 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2405.15842 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2405.15842 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2405.15842 |
| primary_location.id | pmh:oai:arXiv.org:2405.15842 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2405.15842 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2405.15842 |
| publication_date | 2024-05-24 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 23, 56, 83, 122, 137, 148, 225 |
| abstract_inverted_index.As | 121 |
| abstract_inverted_index.In | 78 |
| abstract_inverted_index.To | 215 |
| abstract_inverted_index.We | 134 |
| abstract_inverted_index.an | 178 |
| abstract_inverted_index.at | 45 |
| abstract_inverted_index.by | 177 |
| abstract_inverted_index.in | 51, 103, 186, 201, 234 |
| abstract_inverted_index.is | 219 |
| abstract_inverted_index.it | 126 |
| abstract_inverted_index.no | 128 |
| abstract_inverted_index.of | 3, 154, 180, 227 |
| abstract_inverted_index.on | 59, 64, 99, 165 |
| abstract_inverted_index.or | 198 |
| abstract_inverted_index.to | 75, 93, 112, 130, 140, 143, 150, 184, 207, 223 |
| abstract_inverted_index.up | 183 |
| abstract_inverted_index.we | 81 |
| abstract_inverted_index.70% | 185 |
| abstract_inverted_index.LLM | 235 |
| abstract_inverted_index.Our | 107 |
| abstract_inverted_index.The | 0 |
| abstract_inverted_index.and | 19, 30, 72, 89, 116, 147, 158, 182, 194, 210 |
| abstract_inverted_index.can | 35 |
| abstract_inverted_index.for | 229 |
| abstract_inverted_index.has | 8 |
| abstract_inverted_index.our | 172, 216 |
| abstract_inverted_index.per | 162 |
| abstract_inverted_index.the | 15, 46, 60, 65, 68, 100, 152, 187, 220, 231 |
| abstract_inverted_index.yet | 14 |
| abstract_inverted_index.26%, | 181 |
| abstract_inverted_index.best | 188 |
| abstract_inverted_index.both | 113, 208 |
| abstract_inverted_index.code | 11, 105, 236 |
| abstract_inverted_index.cost | 21 |
| abstract_inverted_index.find | 94 |
| abstract_inverted_index.have | 55 |
| abstract_inverted_index.same | 47 |
| abstract_inverted_index.show | 170 |
| abstract_inverted_index.test | 156, 159 |
| abstract_inverted_index.that | 171 |
| abstract_inverted_index.they | 40 |
| abstract_inverted_index.this | 79, 218 |
| abstract_inverted_index.user | 70 |
| abstract_inverted_index.when | 142 |
| abstract_inverted_index.with | 238 |
| abstract_inverted_index.work | 222 |
| abstract_inverted_index.While | 26 |
| abstract_inverted_index.based | 164 |
| abstract_inverted_index.case, | 189 |
| abstract_inverted_index.costs | 176 |
| abstract_inverted_index.first | 221 |
| abstract_inverted_index.incur | 41 |
| abstract_inverted_index.large | 4 |
| abstract_inverted_index.lines | 160 |
| abstract_inverted_index.model | 87, 118, 132, 192 |
| abstract_inverted_index.novel | 84 |
| abstract_inverted_index.rapid | 1 |
| abstract_inverted_index.tasks | 205 |
| abstract_inverted_index.tests | 111 |
| abstract_inverted_index.time. | 48 |
| abstract_inverted_index.using | 27 |
| abstract_inverted_index.while | 196 |
| abstract_inverted_index.work, | 80 |
| abstract_inverted_index.wrong | 76 |
| abstract_inverted_index.(LLMs) | 7 |
| abstract_inverted_index.access | 129 |
| abstract_inverted_index.across | 190 |
| abstract_inverted_index.budget | 166 |
| abstract_inverted_index.cases, | 157 |
| abstract_inverted_index.deploy | 144 |
| abstract_inverted_index.larger | 28, 145 |
| abstract_inverted_index.model, | 163 |
| abstract_inverted_index.models | 6, 29, 146 |
| abstract_inverted_index.number | 153 |
| abstract_inverted_index.output | 38 |
| abstract_inverted_index.random | 209 |
| abstract_inverted_index.series | 226 |
| abstract_inverted_index.tasks, | 13 |
| abstract_inverted_index.users' | 73 |
| abstract_inverted_index.average | 179 |
| abstract_inverted_index.between | 17 |
| abstract_inverted_index.budget, | 66 |
| abstract_inverted_index.choices | 228 |
| abstract_inverted_index.dynamic | 57 |
| abstract_inverted_index.enhance | 114 |
| abstract_inverted_index.further | 135 |
| abstract_inverted_index.improve | 37 |
| abstract_inverted_index.method, | 125 |
| abstract_inverted_index.natural | 202 |
| abstract_inverted_index.optimal | 211 |
| abstract_inverted_index.options | 98 |
| abstract_inverted_index.propose | 136 |
| abstract_inverted_index.provide | 224 |
| abstract_inverted_index.reduces | 175 |
| abstract_inverted_index.remains | 22 |
| abstract_inverted_index.results | 169 |
| abstract_inverted_index.servers | 50 |
| abstract_inverted_index.usually | 54 |
| abstract_inverted_index.various | 191 |
| abstract_inverted_index.volume, | 71 |
| abstract_inverted_index.accuracy | 18, 115, 200 |
| abstract_inverted_index.answers. | 77 |
| abstract_inverted_index.approach | 108, 174 |
| abstract_inverted_index.blackbox | 123 |
| abstract_inverted_index.compared | 206 |
| abstract_inverted_index.critical | 24 |
| abstract_inverted_index.evaluate | 117 |
| abstract_inverted_index.expenses | 44 |
| abstract_inverted_index.families | 193 |
| abstract_inverted_index.improved | 10 |
| abstract_inverted_index.internal | 131 |
| abstract_inverted_index.language | 5, 203 |
| abstract_inverted_index.multiple | 95 |
| abstract_inverted_index.optimize | 151 |
| abstract_inverted_index.requires | 127 |
| abstract_inverted_index.schemes. | 214 |
| abstract_inverted_index.tradeoff | 102 |
| abstract_inverted_index.LLM-based | 104 |
| abstract_inverted_index.accuracy, | 39 |
| abstract_inverted_index.algorithm | 139 |
| abstract_inverted_index.cascading | 88, 119, 173 |
| abstract_inverted_index.combining | 86 |
| abstract_inverted_index.datasets, | 195 |
| abstract_inverted_index.depending | 63 |
| abstract_inverted_index.determine | 141 |
| abstract_inverted_index.framework | 85 |
| abstract_inverted_index.generated | 161 |
| abstract_inverted_index.heuristic | 149 |
| abstract_inverted_index.improving | 199 |
| abstract_inverted_index.introduce | 82 |
| abstract_inverted_index.leverages | 109 |
| abstract_inverted_index.scenarios | 53 |
| abstract_inverted_index.trade-off | 16, 233 |
| abstract_inverted_index.tradeoff, | 62 |
| abstract_inverted_index.algorithms | 34, 92 |
| abstract_inverted_index.bandwidth, | 67 |
| abstract_inverted_index.challenge. | 25 |
| abstract_inverted_index.completion | 12 |
| abstract_inverted_index.concurrent | 69 |
| abstract_inverted_index.decisions. | 120 |
| abstract_inverted_index.generation | 204, 237 |
| abstract_inverted_index.knowledge, | 217 |
| abstract_inverted_index.optimizing | 230 |
| abstract_inverted_index.preference | 58 |
| abstract_inverted_index.real-world | 52 |
| abstract_inverted_index.solutions, | 155 |
| abstract_inverted_index.advancement | 2 |
| abstract_inverted_index.generation. | 106 |
| abstract_inverted_index.maintaining | 197 |
| abstract_inverted_index.parameters. | 133 |
| abstract_inverted_index.sensitivity | 74 |
| abstract_inverted_index.substantial | 42 |
| abstract_inverted_index.Experimental | 168 |
| abstract_inverted_index.Furthermore, | 49 |
| abstract_inverted_index.constraints. | 167 |
| abstract_inverted_index.near-optimal | 96 |
| abstract_inverted_index.self-testing | 33, 97, 213 |
| abstract_inverted_index.single-model | 212 |
| abstract_inverted_index.computational | 20, 43 |
| abstract_inverted_index.cost-accuracy | 61, 101, 232 |
| abstract_inverted_index.incorporating | 31 |
| abstract_inverted_index.self-feedback | 91 |
| abstract_inverted_index.self-testing. | 239 |
| abstract_inverted_index.significantly | 9, 36 |
| abstract_inverted_index.inference-time | 32, 90, 124 |
| abstract_inverted_index.self-generated | 110 |
| abstract_inverted_index.threshold-based | 138 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile |