ACECode: A Reinforcement Learning Framework for Aligning Code Efficiency and Correctness in Code Language Models Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2412.17264
CodeLLMs have demonstrated remarkable advancements in software engineering tasks. However, while these models can generate functionally correct code, they often produce code that is inefficient in terms of runtime. This inefficiency is particularly problematic in resource-constrained environments, impacting software performance and sustainability. Existing approaches for optimizing code efficiency for CodeLLMs like SOAP and PIE exhibit certain limitations. SOAP requires a compatible execution environment and predefined test cases for iterative code modification, while PIE focuses on instruction tuning, improving efficiency but compromising correctness. These shortcomings highlight the need for a fine-tuning framework that optimizes both efficiency and correctness without relying on predefined test cases or specific execution environments. To bridge this gap, we introduce ACECode, a reinforcement learning-based fine-tuning framework that aligns CodeLLMs with dual objectives of efficiency and correctness. ACECode combines three key steps: (1) generating code with an actor CodeLLM, (2) calculating a training-free reward signal derived from code execution feedback for each generated code, and (3) optimizing the CodeLLM via Proximal Policy Optimization (PPO) algorithm. This reward signal enables joint assessment of efficiency and correctness without manual labeling. We evaluate ACECode by fine-tuning four SOTA (state-of-the-art) CodeLLMs and comparing their code with three baselines: original, instruction-tuned, and PIE-tuned CodeLLMs. Extensive experiment results suggest that \tool{} significantly improves the efficiency and correctness of generated code against all baselines for all CodeLLMs. Specifically, CodeLLMs fine-tuned with ACECode improve pass@1 by 1.84% to 14.51% and reduce runtime in 65% to 72% of cases compared to original CodeLLMs.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2412.17264
- https://arxiv.org/pdf/2412.17264
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4405767809
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4405767809Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2412.17264Digital Object Identifier
- Title
-
ACECode: A Reinforcement Learning Framework for Aligning Code Efficiency and Correctness in Code Language ModelsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-12-23Full publication date if available
- Authors
-
Chengran Yang, Hong Jin Kang, Jieke Shi, David LoList of authors in order
- Landing page
-
https://arxiv.org/abs/2412.17264Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2412.17264Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2412.17264Direct OA link when available
- Concepts
-
Correctness, Code (set theory), Computer science, Reinforcement learning, Programming language, Artificial intelligence, Set (abstract data type)Top concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4405767809 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2412.17264 |
| ids.doi | https://doi.org/10.48550/arxiv.2412.17264 |
| ids.openalex | https://openalex.org/W4405767809 |
| fwci | |
| type | preprint |
| title | ACECode: A Reinforcement Learning Framework for Aligning Code Efficiency and Correctness in Code Language Models |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10260 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9842000007629395 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1710 |
| topics[0].subfield.display_name | Information Systems |
| topics[0].display_name | Software Engineering Research |
| topics[1].id | https://openalex.org/T12127 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9546999931335449 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1705 |
| topics[1].subfield.display_name | Computer Networks and Communications |
| topics[1].display_name | Software System Performance and Reliability |
| topics[2].id | https://openalex.org/T12423 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9279999732971191 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1712 |
| topics[2].subfield.display_name | Software |
| topics[2].display_name | Software Reliability and Analysis Research |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C55439883 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8612520694732666 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q360812 |
| concepts[0].display_name | Correctness |
| concepts[1].id | https://openalex.org/C2776760102 |
| concepts[1].level | 3 |
| concepts[1].score | 0.6696608662605286 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q5139990 |
| concepts[1].display_name | Code (set theory) |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.6388097405433655 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C97541855 |
| concepts[3].level | 2 |
| concepts[3].score | 0.6168087124824524 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q830687 |
| concepts[3].display_name | Reinforcement learning |
| concepts[4].id | https://openalex.org/C199360897 |
| concepts[4].level | 1 |
| concepts[4].score | 0.5377694368362427 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[4].display_name | Programming language |
| concepts[5].id | https://openalex.org/C154945302 |
| concepts[5].level | 1 |
| concepts[5].score | 0.3002881407737732 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[5].display_name | Artificial intelligence |
| concepts[6].id | https://openalex.org/C177264268 |
| concepts[6].level | 2 |
| concepts[6].score | 0.0 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q1514741 |
| concepts[6].display_name | Set (abstract data type) |
| keywords[0].id | https://openalex.org/keywords/correctness |
| keywords[0].score | 0.8612520694732666 |
| keywords[0].display_name | Correctness |
| keywords[1].id | https://openalex.org/keywords/code |
| keywords[1].score | 0.6696608662605286 |
| keywords[1].display_name | Code (set theory) |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.6388097405433655 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/reinforcement-learning |
| keywords[3].score | 0.6168087124824524 |
| keywords[3].display_name | Reinforcement learning |
| keywords[4].id | https://openalex.org/keywords/programming-language |
| keywords[4].score | 0.5377694368362427 |
| keywords[4].display_name | Programming language |
| keywords[5].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[5].score | 0.3002881407737732 |
| keywords[5].display_name | Artificial intelligence |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2412.17264 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2412.17264 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2412.17264 |
| locations[1].id | doi:10.48550/arxiv.2412.17264 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2412.17264 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5037441723 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-4577-5590 |
| authorships[0].author.display_name | Chengran Yang |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Yang, Chengran |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5027335548 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-7335-7295 |
| authorships[1].author.display_name | Hong Jin Kang |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Kang, Hong Jin |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5002667771 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-0799-5018 |
| authorships[2].author.display_name | Jieke Shi |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Shi, Jieke |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5081036622 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-4367-7201 |
| authorships[3].author.display_name | David Lo |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Lo, David |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2412.17264 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | ACECode: A Reinforcement Learning Framework for Aligning Code Efficiency and Correctness in Code Language Models |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10260 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9842000007629395 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1710 |
| primary_topic.subfield.display_name | Information Systems |
| primary_topic.display_name | Software Engineering Research |
| related_works | https://openalex.org/W3008339103, https://openalex.org/W1667647204, https://openalex.org/W2404647514, https://openalex.org/W4247536566, https://openalex.org/W4241418540, https://openalex.org/W2018477250, https://openalex.org/W3119814709, https://openalex.org/W1508895727, https://openalex.org/W2725786787, https://openalex.org/W1590965489 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2412.17264 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2412.17264 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2412.17264 |
| primary_location.id | pmh:oai:arXiv.org:2412.17264 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2412.17264 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2412.17264 |
| publication_date | 2024-12-23 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 59, 88, 114, 143 |
| abstract_inverted_index.To | 107 |
| abstract_inverted_index.We | 180 |
| abstract_inverted_index.an | 138 |
| abstract_inverted_index.by | 183, 229 |
| abstract_inverted_index.in | 5, 25, 34, 236 |
| abstract_inverted_index.is | 23, 31 |
| abstract_inverted_index.of | 27, 125, 173, 213, 240 |
| abstract_inverted_index.on | 74, 99 |
| abstract_inverted_index.or | 103 |
| abstract_inverted_index.to | 231, 238, 243 |
| abstract_inverted_index.we | 111 |
| abstract_inverted_index.(1) | 134 |
| abstract_inverted_index.(2) | 141 |
| abstract_inverted_index.(3) | 157 |
| abstract_inverted_index.65% | 237 |
| abstract_inverted_index.72% | 239 |
| abstract_inverted_index.PIE | 53, 72 |
| abstract_inverted_index.all | 217, 220 |
| abstract_inverted_index.and | 40, 52, 63, 95, 127, 156, 175, 189, 198, 211, 233 |
| abstract_inverted_index.but | 79 |
| abstract_inverted_index.can | 13 |
| abstract_inverted_index.for | 44, 48, 67, 87, 152, 219 |
| abstract_inverted_index.key | 132 |
| abstract_inverted_index.the | 85, 159, 209 |
| abstract_inverted_index.via | 161 |
| abstract_inverted_index.SOAP | 51, 57 |
| abstract_inverted_index.SOTA | 186 |
| abstract_inverted_index.This | 29, 167 |
| abstract_inverted_index.both | 93 |
| abstract_inverted_index.code | 21, 46, 69, 136, 149, 192, 215 |
| abstract_inverted_index.dual | 123 |
| abstract_inverted_index.each | 153 |
| abstract_inverted_index.four | 185 |
| abstract_inverted_index.from | 148 |
| abstract_inverted_index.gap, | 110 |
| abstract_inverted_index.have | 1 |
| abstract_inverted_index.like | 50 |
| abstract_inverted_index.need | 86 |
| abstract_inverted_index.test | 65, 101 |
| abstract_inverted_index.that | 22, 91, 119, 205 |
| abstract_inverted_index.they | 18 |
| abstract_inverted_index.this | 109 |
| abstract_inverted_index.with | 122, 137, 193, 225 |
| abstract_inverted_index.(PPO) | 165 |
| abstract_inverted_index.1.84% | 230 |
| abstract_inverted_index.These | 82 |
| abstract_inverted_index.actor | 139 |
| abstract_inverted_index.cases | 66, 102, 241 |
| abstract_inverted_index.code, | 17, 155 |
| abstract_inverted_index.joint | 171 |
| abstract_inverted_index.often | 19 |
| abstract_inverted_index.terms | 26 |
| abstract_inverted_index.their | 191 |
| abstract_inverted_index.these | 11 |
| abstract_inverted_index.three | 131, 194 |
| abstract_inverted_index.while | 10, 71 |
| abstract_inverted_index.14.51% | 232 |
| abstract_inverted_index.Policy | 163 |
| abstract_inverted_index.aligns | 120 |
| abstract_inverted_index.bridge | 108 |
| abstract_inverted_index.manual | 178 |
| abstract_inverted_index.models | 12 |
| abstract_inverted_index.pass@1 | 228 |
| abstract_inverted_index.reduce | 234 |
| abstract_inverted_index.reward | 145, 168 |
| abstract_inverted_index.signal | 146, 169 |
| abstract_inverted_index.steps: | 133 |
| abstract_inverted_index.tasks. | 8 |
| abstract_inverted_index.ACECode | 129, 182, 226 |
| abstract_inverted_index.CodeLLM | 160 |
| abstract_inverted_index.\tool{} | 206 |
| abstract_inverted_index.against | 216 |
| abstract_inverted_index.certain | 55 |
| abstract_inverted_index.correct | 16 |
| abstract_inverted_index.derived | 147 |
| abstract_inverted_index.enables | 170 |
| abstract_inverted_index.exhibit | 54 |
| abstract_inverted_index.focuses | 73 |
| abstract_inverted_index.improve | 227 |
| abstract_inverted_index.produce | 20 |
| abstract_inverted_index.relying | 98 |
| abstract_inverted_index.results | 203 |
| abstract_inverted_index.runtime | 235 |
| abstract_inverted_index.suggest | 204 |
| abstract_inverted_index.tuning, | 76 |
| abstract_inverted_index.without | 97, 177 |
| abstract_inverted_index.ACECode, | 113 |
| abstract_inverted_index.CodeLLM, | 140 |
| abstract_inverted_index.CodeLLMs | 0, 49, 121, 188, 223 |
| abstract_inverted_index.Existing | 42 |
| abstract_inverted_index.However, | 9 |
| abstract_inverted_index.Proximal | 162 |
| abstract_inverted_index.combines | 130 |
| abstract_inverted_index.compared | 242 |
| abstract_inverted_index.evaluate | 181 |
| abstract_inverted_index.feedback | 151 |
| abstract_inverted_index.generate | 14 |
| abstract_inverted_index.improves | 208 |
| abstract_inverted_index.original | 244 |
| abstract_inverted_index.requires | 58 |
| abstract_inverted_index.runtime. | 28 |
| abstract_inverted_index.software | 6, 38 |
| abstract_inverted_index.specific | 104 |
| abstract_inverted_index.CodeLLMs. | 200, 221, 245 |
| abstract_inverted_index.Extensive | 201 |
| abstract_inverted_index.PIE-tuned | 199 |
| abstract_inverted_index.baselines | 218 |
| abstract_inverted_index.comparing | 190 |
| abstract_inverted_index.execution | 61, 105, 150 |
| abstract_inverted_index.framework | 90, 118 |
| abstract_inverted_index.generated | 154, 214 |
| abstract_inverted_index.highlight | 84 |
| abstract_inverted_index.impacting | 37 |
| abstract_inverted_index.improving | 77 |
| abstract_inverted_index.introduce | 112 |
| abstract_inverted_index.iterative | 68 |
| abstract_inverted_index.labeling. | 179 |
| abstract_inverted_index.optimizes | 92 |
| abstract_inverted_index.original, | 196 |
| abstract_inverted_index.algorithm. | 166 |
| abstract_inverted_index.approaches | 43 |
| abstract_inverted_index.assessment | 172 |
| abstract_inverted_index.baselines: | 195 |
| abstract_inverted_index.compatible | 60 |
| abstract_inverted_index.efficiency | 47, 78, 94, 126, 174, 210 |
| abstract_inverted_index.experiment | 202 |
| abstract_inverted_index.fine-tuned | 224 |
| abstract_inverted_index.generating | 135 |
| abstract_inverted_index.objectives | 124 |
| abstract_inverted_index.optimizing | 45, 158 |
| abstract_inverted_index.predefined | 64, 100 |
| abstract_inverted_index.remarkable | 3 |
| abstract_inverted_index.calculating | 142 |
| abstract_inverted_index.correctness | 96, 176, 212 |
| abstract_inverted_index.engineering | 7 |
| abstract_inverted_index.environment | 62 |
| abstract_inverted_index.fine-tuning | 89, 117, 184 |
| abstract_inverted_index.inefficient | 24 |
| abstract_inverted_index.instruction | 75 |
| abstract_inverted_index.performance | 39 |
| abstract_inverted_index.problematic | 33 |
| abstract_inverted_index.Optimization | 164 |
| abstract_inverted_index.advancements | 4 |
| abstract_inverted_index.compromising | 80 |
| abstract_inverted_index.correctness. | 81, 128 |
| abstract_inverted_index.demonstrated | 2 |
| abstract_inverted_index.functionally | 15 |
| abstract_inverted_index.inefficiency | 30 |
| abstract_inverted_index.limitations. | 56 |
| abstract_inverted_index.particularly | 32 |
| abstract_inverted_index.shortcomings | 83 |
| abstract_inverted_index.Specifically, | 222 |
| abstract_inverted_index.environments, | 36 |
| abstract_inverted_index.environments. | 106 |
| abstract_inverted_index.modification, | 70 |
| abstract_inverted_index.reinforcement | 115 |
| abstract_inverted_index.significantly | 207 |
| abstract_inverted_index.training-free | 144 |
| abstract_inverted_index.learning-based | 116 |
| abstract_inverted_index.sustainability. | 41 |
| abstract_inverted_index.(state-of-the-art) | 187 |
| abstract_inverted_index.instruction-tuned, | 197 |
| abstract_inverted_index.resource-constrained | 35 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile |