ACECode: A Reinforcement Learning Framework for Aligning Code Efficiency and Correctness in Code Language Models Article Swipe

PDF

Chengran Yang , Hong Jin Kang , Jieke Shi , David Lo ·

YOU? · · 2024 · Open Access · · DOI: https://doi.org/10.48550/arxiv.2412.17264

CodeLLMs have demonstrated remarkable advancements in software engineering tasks. However, while these models can generate functionally correct code, they often produce code that is inefficient in terms of runtime. This inefficiency is particularly problematic in resource-constrained environments, impacting software performance and sustainability. Existing approaches for optimizing code efficiency for CodeLLMs like SOAP and PIE exhibit certain limitations. SOAP requires a compatible execution environment and predefined test cases for iterative code modification, while PIE focuses on instruction tuning, improving efficiency but compromising correctness. These shortcomings highlight the need for a fine-tuning framework that optimizes both efficiency and correctness without relying on predefined test cases or specific execution environments. To bridge this gap, we introduce ACECode, a reinforcement learning-based fine-tuning framework that aligns CodeLLMs with dual objectives of efficiency and correctness. ACECode combines three key steps: (1) generating code with an actor CodeLLM, (2) calculating a training-free reward signal derived from code execution feedback for each generated code, and (3) optimizing the CodeLLM via Proximal Policy Optimization (PPO) algorithm. This reward signal enables joint assessment of efficiency and correctness without manual labeling. We evaluate ACECode by fine-tuning four SOTA (state-of-the-art) CodeLLMs and comparing their code with three baselines: original, instruction-tuned, and PIE-tuned CodeLLMs. Extensive experiment results suggest that \tool{} significantly improves the efficiency and correctness of generated code against all baselines for all CodeLLMs. Specifically, CodeLLMs fine-tuned with ACECode improve pass@1 by 1.84% to 14.51% and reduce runtime in 65% to 72% of cases compared to original CodeLLMs.

Related Topics

Computer Science

Reinforcement Learning

Programming Language

Artificial Intelligence

Concepts

Correctness Code (set theory) Computer science Reinforcement learning Programming language Artificial intelligence Set (abstract data type)

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/2412.17264
PDF: https://arxiv.org/pdf/2412.17264
OA Status: green
Related Works: 10
OpenAlex ID: https://openalex.org/W4405767809

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W4405767809

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.48550/arxiv.2412.17264

Digital Object Identifier
Title: ACECode: A Reinforcement Learning Framework for Aligning Code Efficiency and Correctness in Code Language Models

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2024

Year of publication
Publication date: 2024-12-23

Full publication date if available
Authors: Chengran Yang, Hong Jin Kang, Jieke Shi, David Lo

List of authors in order
Landing page: https://arxiv.org/abs/2412.17264

Publisher landing page
PDF URL: https://arxiv.org/pdf/2412.17264

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/2412.17264

Direct OA link when available
Concepts: Correctness, Code (set theory), Computer science, Reinforcement learning, Programming language, Artificial intelligence, Set (abstract data type)

Top concepts (fields/topics) attached by OpenAlex
Cited by: 0

Total citation count in OpenAlex
Related works (count): 10

Other works algorithmically related by OpenAlex

Full payload

id	https://openalex.org/W4405767809
doi	https://doi.org/10.48550/arxiv.2412.17264
ids.doi	https://doi.org/10.48550/arxiv.2412.17264
ids.openalex	https://openalex.org/W4405767809
fwci
type	preprint
title	ACECode: A Reinforcement Learning Framework for Aligning Code Efficiency and Correctness in Code Language Models
biblio.issue
biblio.volume
biblio.last_page
biblio.first_page
topics[0].id	https://openalex.org/T10260
topics[0].field.id	https://openalex.org/fields/17
topics[0].field.display_name	Computer Science
topics[0].score	0.9842000007629395
topics[0].domain.id	https://openalex.org/domains/3
topics[0].domain.display_name	Physical Sciences
topics[0].subfield.id	https://openalex.org/subfields/1710
topics[0].subfield.display_name	Information Systems
topics[0].display_name	Software Engineering Research
topics[1].id	https://openalex.org/T12127
topics[1].field.id	https://openalex.org/fields/17
topics[1].field.display_name	Computer Science
topics[1].score	0.9546999931335449
topics[1].domain.id	https://openalex.org/domains/3
topics[1].domain.display_name	Physical Sciences
topics[1].subfield.id	https://openalex.org/subfields/1705
topics[1].subfield.display_name	Computer Networks and Communications
topics[1].display_name	Software System Performance and Reliability
topics[2].id	https://openalex.org/T12423
topics[2].field.id	https://openalex.org/fields/17
topics[2].field.display_name	Computer Science
topics[2].score	0.9279999732971191
topics[2].domain.id	https://openalex.org/domains/3
topics[2].domain.display_name	Physical Sciences
topics[2].subfield.id	https://openalex.org/subfields/1712
topics[2].subfield.display_name	Software
topics[2].display_name	Software Reliability and Analysis Research
is_xpac	False
apc_list
apc_paid
concepts[0].id	https://openalex.org/C55439883
concepts[0].level	2
concepts[0].score	0.8612520694732666
concepts[0].wikidata	https://www.wikidata.org/wiki/Q360812
concepts[0].display_name	Correctness
concepts[1].id	https://openalex.org/C2776760102
concepts[1].level	3
concepts[1].score	0.6696608662605286
concepts[1].wikidata	https://www.wikidata.org/wiki/Q5139990
concepts[1].display_name	Code (set theory)
concepts[2].id	https://openalex.org/C41008148
concepts[2].level	0
concepts[2].score	0.6388097405433655
concepts[2].wikidata	https://www.wikidata.org/wiki/Q21198
concepts[2].display_name	Computer science
concepts[3].id	https://openalex.org/C97541855
concepts[3].level	2
concepts[3].score	0.6168087124824524
concepts[3].wikidata	https://www.wikidata.org/wiki/Q830687
concepts[3].display_name	Reinforcement learning
concepts[4].id	https://openalex.org/C199360897
concepts[4].level	1
concepts[4].score	0.5377694368362427
concepts[4].wikidata	https://www.wikidata.org/wiki/Q9143
concepts[4].display_name	Programming language
concepts[5].id	https://openalex.org/C154945302
concepts[5].level	1
concepts[5].score	0.3002881407737732
concepts[5].wikidata	https://www.wikidata.org/wiki/Q11660
concepts[5].display_name	Artificial intelligence
concepts[6].id	https://openalex.org/C177264268
concepts[6].level	2
concepts[6].score	0.0
concepts[6].wikidata	https://www.wikidata.org/wiki/Q1514741
concepts[6].display_name	Set (abstract data type)
keywords[0].id	https://openalex.org/keywords/correctness
keywords[0].score	0.8612520694732666
keywords[0].display_name	Correctness
keywords[1].id	https://openalex.org/keywords/code
keywords[1].score	0.6696608662605286
keywords[1].display_name	Code (set theory)
keywords[2].id	https://openalex.org/keywords/computer-science
keywords[2].score	0.6388097405433655
keywords[2].display_name	Computer science
keywords[3].id	https://openalex.org/keywords/reinforcement-learning
keywords[3].score	0.6168087124824524
keywords[3].display_name	Reinforcement learning
keywords[4].id	https://openalex.org/keywords/programming-language
keywords[4].score	0.5377694368362427
keywords[4].display_name	Programming language
keywords[5].id	https://openalex.org/keywords/artificial-intelligence
keywords[5].score	0.3002881407737732
keywords[5].display_name	Artificial intelligence
language	en
locations[0].id	pmh:oai:arXiv.org:2412.17264
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4306400194
locations[0].source.issn
locations[0].source.type	repository
locations[0].source.is_oa	True
locations[0].source.issn_l
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	arXiv (Cornell University)
locations[0].source.host_organization	https://openalex.org/I205783295
locations[0].source.host_organization_name	Cornell University
locations[0].source.host_organization_lineage	https://openalex.org/I205783295
locations[0].license
locations[0].pdf_url	https://arxiv.org/pdf/2412.17264
locations[0].version	submittedVersion
locations[0].raw_type	text
locations[0].license_id
locations[0].is_accepted	False
locations[0].is_published	False
locations[0].raw_source_name
locations[0].landing_page_url	http://arxiv.org/abs/2412.17264
locations[1].id	doi:10.48550/arxiv.2412.17264
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license	cc-by
locations[1].pdf_url
locations[1].version
locations[1].raw_type	article
locations[1].license_id	https://openalex.org/licenses/cc-by
locations[1].is_accepted	False
locations[1].is_published
locations[1].raw_source_name
locations[1].landing_page_url	https://doi.org/10.48550/arxiv.2412.17264
indexed_in	arxiv, datacite
authorships[0].author.id	https://openalex.org/A5037441723
authorships[0].author.orcid	https://orcid.org/0000-0003-4577-5590
authorships[0].author.display_name	Chengran Yang
authorships[0].author_position	first
authorships[0].raw_author_name	Yang, Chengran
authorships[0].is_corresponding	False
authorships[1].author.id	https://openalex.org/A5027335548
authorships[1].author.orcid	https://orcid.org/0000-0001-7335-7295
authorships[1].author.display_name	Hong Jin Kang
authorships[1].author_position	middle
authorships[1].raw_author_name	Kang, Hong Jin
authorships[1].is_corresponding	False
authorships[2].author.id	https://openalex.org/A5002667771
authorships[2].author.orcid	https://orcid.org/0000-0002-0799-5018
authorships[2].author.display_name	Jieke Shi
authorships[2].author_position	middle
authorships[2].raw_author_name	Shi, Jieke
authorships[2].is_corresponding	False
authorships[3].author.id	https://openalex.org/A5081036622
authorships[3].author.orcid	https://orcid.org/0000-0002-4367-7201
authorships[3].author.display_name	David Lo
authorships[3].author_position	last
authorships[3].raw_author_name	Lo, David
authorships[3].is_corresponding	False
has_content.pdf	False
has_content.grobid_xml	False
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/2412.17264
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2025-10-10T00:00:00
display_name	ACECode: A Reinforcement Learning Framework for Aligning Code Efficiency and Correctness in Code Language Models
has_fulltext	False
is_retracted	False
updated_date	2025-11-06T06:51:31.235846
primary_topic.id	https://openalex.org/T10260
primary_topic.field.id	https://openalex.org/fields/17
primary_topic.field.display_name	Computer Science
primary_topic.score	0.9842000007629395
primary_topic.domain.id	https://openalex.org/domains/3
primary_topic.domain.display_name	Physical Sciences
primary_topic.subfield.id	https://openalex.org/subfields/1710
primary_topic.subfield.display_name	Information Systems
primary_topic.display_name	Software Engineering Research
related_works	https://openalex.org/W3008339103, https://openalex.org/W1667647204, https://openalex.org/W2404647514, https://openalex.org/W4247536566, https://openalex.org/W4241418540, https://openalex.org/W2018477250, https://openalex.org/W3119814709, https://openalex.org/W1508895727, https://openalex.org/W2725786787, https://openalex.org/W1590965489
cited_by_count	0
locations_count	2
best_oa_location.id	pmh:oai:arXiv.org:2412.17264
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license
best_oa_location.pdf_url	https://arxiv.org/pdf/2412.17264
best_oa_location.version	submittedVersion
best_oa_location.raw_type	text
best_oa_location.license_id
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/2412.17264
primary_location.id	pmh:oai:arXiv.org:2412.17264
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4306400194
primary_location.source.issn
primary_location.source.type	repository
primary_location.source.is_oa	True
primary_location.source.issn_l
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	arXiv (Cornell University)
primary_location.source.host_organization	https://openalex.org/I205783295
primary_location.source.host_organization_name	Cornell University
primary_location.source.host_organization_lineage	https://openalex.org/I205783295
primary_location.license
primary_location.pdf_url	https://arxiv.org/pdf/2412.17264
primary_location.version	submittedVersion
primary_location.raw_type	text
primary_location.license_id
primary_location.is_accepted	False
primary_location.is_published	False
primary_location.raw_source_name
primary_location.landing_page_url	http://arxiv.org/abs/2412.17264
publication_date	2024-12-23
publication_year	2024
referenced_works_count	0
abstract_inverted_index.a	59, 88, 114, 143
abstract_inverted_index.To	107
abstract_inverted_index.We	180
abstract_inverted_index.an	138
abstract_inverted_index.by	183, 229
abstract_inverted_index.in	5, 25, 34, 236
abstract_inverted_index.is	23, 31
abstract_inverted_index.of	27, 125, 173, 213, 240
abstract_inverted_index.on	74, 99
abstract_inverted_index.or	103
abstract_inverted_index.to	231, 238, 243
abstract_inverted_index.we	111
abstract_inverted_index.(1)	134
abstract_inverted_index.(2)	141
abstract_inverted_index.(3)	157
abstract_inverted_index.65%	237
abstract_inverted_index.72%	239
abstract_inverted_index.PIE	53, 72
abstract_inverted_index.all	217, 220
abstract_inverted_index.and	40, 52, 63, 95, 127, 156, 175, 189, 198, 211, 233
abstract_inverted_index.but	79
abstract_inverted_index.can	13
abstract_inverted_index.for	44, 48, 67, 87, 152, 219
abstract_inverted_index.key	132
abstract_inverted_index.the	85, 159, 209
abstract_inverted_index.via	161
abstract_inverted_index.SOAP	51, 57
abstract_inverted_index.SOTA	186
abstract_inverted_index.This	29, 167
abstract_inverted_index.both	93
abstract_inverted_index.code	21, 46, 69, 136, 149, 192, 215
abstract_inverted_index.dual	123
abstract_inverted_index.each	153
abstract_inverted_index.four	185
abstract_inverted_index.from	148
abstract_inverted_index.gap,	110
abstract_inverted_index.have	1
abstract_inverted_index.like	50
abstract_inverted_index.need	86
abstract_inverted_index.test	65, 101
abstract_inverted_index.that	22, 91, 119, 205
abstract_inverted_index.they	18
abstract_inverted_index.this	109
abstract_inverted_index.with	122, 137, 193, 225
abstract_inverted_index.(PPO)	165
abstract_inverted_index.1.84%	230
abstract_inverted_index.These	82
abstract_inverted_index.actor	139
abstract_inverted_index.cases	66, 102, 241
abstract_inverted_index.code,	17, 155
abstract_inverted_index.joint	171
abstract_inverted_index.often	19
abstract_inverted_index.terms	26
abstract_inverted_index.their	191
abstract_inverted_index.these	11
abstract_inverted_index.three	131, 194
abstract_inverted_index.while	10, 71
abstract_inverted_index.14.51%	232
abstract_inverted_index.Policy	163
abstract_inverted_index.aligns	120
abstract_inverted_index.bridge	108
abstract_inverted_index.manual	178
abstract_inverted_index.models	12
abstract_inverted_index.pass@1	228
abstract_inverted_index.reduce	234
abstract_inverted_index.reward	145, 168
abstract_inverted_index.signal	146, 169
abstract_inverted_index.steps:	133
abstract_inverted_index.tasks.	8
abstract_inverted_index.ACECode	129, 182, 226
abstract_inverted_index.CodeLLM	160
abstract_inverted_index.\tool{}	206
abstract_inverted_index.against	216
abstract_inverted_index.certain	55
abstract_inverted_index.correct	16
abstract_inverted_index.derived	147
abstract_inverted_index.enables	170
abstract_inverted_index.exhibit	54
abstract_inverted_index.focuses	73
abstract_inverted_index.improve	227
abstract_inverted_index.produce	20
abstract_inverted_index.relying	98
abstract_inverted_index.results	203
abstract_inverted_index.runtime	235
abstract_inverted_index.suggest	204
abstract_inverted_index.tuning,	76
abstract_inverted_index.without	97, 177
abstract_inverted_index.ACECode,	113
abstract_inverted_index.CodeLLM,	140
abstract_inverted_index.CodeLLMs	0, 49, 121, 188, 223
abstract_inverted_index.Existing	42
abstract_inverted_index.However,	9
abstract_inverted_index.Proximal	162
abstract_inverted_index.combines	130
abstract_inverted_index.compared	242
abstract_inverted_index.evaluate	181
abstract_inverted_index.feedback	151
abstract_inverted_index.generate	14
abstract_inverted_index.improves	208
abstract_inverted_index.original	244
abstract_inverted_index.requires	58
abstract_inverted_index.runtime.	28
abstract_inverted_index.software	6, 38
abstract_inverted_index.specific	104
abstract_inverted_index.CodeLLMs.	200, 221, 245
abstract_inverted_index.Extensive	201
abstract_inverted_index.PIE-tuned	199
abstract_inverted_index.baselines	218
abstract_inverted_index.comparing	190
abstract_inverted_index.execution	61, 105, 150
abstract_inverted_index.framework	90, 118
abstract_inverted_index.generated	154, 214
abstract_inverted_index.highlight	84
abstract_inverted_index.impacting	37
abstract_inverted_index.improving	77
abstract_inverted_index.introduce	112
abstract_inverted_index.iterative	68
abstract_inverted_index.labeling.	179
abstract_inverted_index.optimizes	92
abstract_inverted_index.original,	196
abstract_inverted_index.algorithm.	166
abstract_inverted_index.approaches	43
abstract_inverted_index.assessment	172
abstract_inverted_index.baselines:	195
abstract_inverted_index.compatible	60
abstract_inverted_index.efficiency	47, 78, 94, 126, 174, 210
abstract_inverted_index.experiment	202
abstract_inverted_index.fine-tuned	224
abstract_inverted_index.generating	135
abstract_inverted_index.objectives	124
abstract_inverted_index.optimizing	45, 158
abstract_inverted_index.predefined	64, 100
abstract_inverted_index.remarkable	3
abstract_inverted_index.calculating	142
abstract_inverted_index.correctness	96, 176, 212
abstract_inverted_index.engineering	7
abstract_inverted_index.environment	62
abstract_inverted_index.fine-tuning	89, 117, 184
abstract_inverted_index.inefficient	24
abstract_inverted_index.instruction	75
abstract_inverted_index.performance	39
abstract_inverted_index.problematic	33
abstract_inverted_index.Optimization	164
abstract_inverted_index.advancements	4
abstract_inverted_index.compromising	80
abstract_inverted_index.correctness.	81, 128
abstract_inverted_index.demonstrated	2
abstract_inverted_index.functionally	15
abstract_inverted_index.inefficiency	30
abstract_inverted_index.limitations.	56
abstract_inverted_index.particularly	32
abstract_inverted_index.shortcomings	83
abstract_inverted_index.Specifically,	222
abstract_inverted_index.environments,	36
abstract_inverted_index.environments.	106
abstract_inverted_index.modification,	70
abstract_inverted_index.reinforcement	115
abstract_inverted_index.significantly	207
abstract_inverted_index.training-free	144
abstract_inverted_index.learning-based	116
abstract_inverted_index.sustainability.	41
abstract_inverted_index.(state-of-the-art)	187
abstract_inverted_index.instruction-tuned,	197
abstract_inverted_index.resource-constrained	35
cited_by_percentile_year
countries_distinct_count	0
institutions_distinct_count	4
citation_normalized_percentile