SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Article Swipe

PDF

Yuxiang Wei , Olivier Duchenne , Jade Copet , Quentin Carbonneaux , Lingming Zhang , Daniel Fried , Gabriel Synnaeve , Rishabh Singh , Sida I. Wang ·

YOU? · · 2025 · Open Access · · DOI: https://doi.org/10.48550/arxiv.2502.18449

The recent DeepSeek-R1 release has demonstrated the immense potential of reinforcement learning (RL) in enhancing the general reasoning capabilities of large language models (LLMs). While DeepSeek-R1 and other follow-up work primarily focus on applying RL to competitive coding and math problems, this paper introduces SWE-RL, the first approach to scale RL-based LLM reasoning for real-world software engineering. Leveraging a lightweight rule-based reward (e.g., the similarity score between ground-truth and LLM-generated solutions), SWE-RL enables LLMs to autonomously recover a developer's reasoning processes and solutions by learning from extensive open-source software evolution data -- the record of a software's entire lifecycle, including its code snapshots, code changes, and events such as issues and pull requests. Trained on top of Llama 3, our resulting reasoning model, Llama3-SWE-RL-70B, achieves a 41.0% solve rate on SWE-bench Verified -- a human-verified collection of real-world GitHub issues. To our knowledge, this is the best performance reported for medium-sized (<100B) LLMs to date, even comparable to leading proprietary LLMs like GPT-4o. Surprisingly, despite performing RL solely on software evolution data, Llama3-SWE-RL has even emerged with generalized reasoning skills. For example, it shows improved results on five out-of-domain tasks, namely, function coding, library use, code reasoning, mathematics, and general language understanding, whereas a supervised-finetuning baseline even leads to performance degradation on average. Overall, SWE-RL opens up a new direction to improve the reasoning capabilities of LLMs through reinforcement learning on massive software engineering data.

Related Topics

Truth And Reconciliation Commission Of Canada

Concepts

No concepts available.

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/2502.18449
PDF: https://arxiv.org/pdf/2502.18449
OA Status: green
OpenAlex ID: https://openalex.org/W4415189338

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W4415189338

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.48550/arxiv.2502.18449

Digital Object Identifier
Title: SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2025

Year of publication
Publication date: 2025-02-25

Full publication date if available
Authors: Yuxiang Wei, Olivier Duchenne, Jade Copet, Quentin Carbonneaux, Lingming Zhang, Daniel Fried, Gabriel Synnaeve, Rishabh Singh, Sida I. Wang

List of authors in order
Landing page: https://arxiv.org/abs/2502.18449

Publisher landing page
PDF URL: https://arxiv.org/pdf/2502.18449

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/2502.18449

Direct OA link when available
Cited by: 0

Total citation count in OpenAlex

Full payload

id	https://openalex.org/W4415189338
doi	https://doi.org/10.48550/arxiv.2502.18449
ids.doi	https://doi.org/10.48550/arxiv.2502.18449
ids.openalex	https://openalex.org/W4415189338
fwci
type	preprint
title	SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
biblio.issue
biblio.volume
biblio.last_page
biblio.first_page
topics[0].id	https://openalex.org/T10260
topics[0].field.id	https://openalex.org/fields/17
topics[0].field.display_name	Computer Science
topics[0].score	0.9269000291824341
topics[0].domain.id	https://openalex.org/domains/3
topics[0].domain.display_name	Physical Sciences
topics[0].subfield.id	https://openalex.org/subfields/1710
topics[0].subfield.display_name	Information Systems
topics[0].display_name	Software Engineering Research
topics[1].id	https://openalex.org/T10456
topics[1].field.id	https://openalex.org/fields/17
topics[1].field.display_name	Computer Science
topics[1].score	0.9143999814987183
topics[1].domain.id	https://openalex.org/domains/3
topics[1].domain.display_name	Physical Sciences
topics[1].subfield.id	https://openalex.org/subfields/1702
topics[1].subfield.display_name	Artificial Intelligence
topics[1].display_name	Multi-Agent Systems and Negotiation
is_xpac	False
apc_list
apc_paid
language	en
locations[0].id	pmh:oai:arXiv.org:2502.18449
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4306400194
locations[0].source.issn
locations[0].source.type	repository
locations[0].source.is_oa	True
locations[0].source.issn_l
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	arXiv (Cornell University)
locations[0].source.host_organization	https://openalex.org/I205783295
locations[0].source.host_organization_name	Cornell University
locations[0].source.host_organization_lineage	https://openalex.org/I205783295
locations[0].license
locations[0].pdf_url	https://arxiv.org/pdf/2502.18449
locations[0].version	submittedVersion
locations[0].raw_type	text
locations[0].license_id
locations[0].is_accepted	False
locations[0].is_published	False
locations[0].raw_source_name
locations[0].landing_page_url	http://arxiv.org/abs/2502.18449
locations[1].id	doi:10.48550/arxiv.2502.18449
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license
locations[1].pdf_url
locations[1].version
locations[1].raw_type	article
locations[1].license_id
locations[1].is_accepted	False
locations[1].is_published
locations[1].raw_source_name
locations[1].landing_page_url	https://doi.org/10.48550/arxiv.2502.18449
indexed_in	arxiv, datacite
authorships[0].author.id	https://openalex.org/A5058312928
authorships[0].author.orcid	https://orcid.org/0000-0002-4391-3753
authorships[0].author.display_name	Yuxiang Wei
authorships[0].author_position	first
authorships[0].raw_author_name	Wei, Yuxiang
authorships[0].is_corresponding	False
authorships[1].author.id	https://openalex.org/A5086412493
authorships[1].author.orcid
authorships[1].author.display_name	Olivier Duchenne
authorships[1].author_position	middle
authorships[1].raw_author_name	Duchenne, Olivier
authorships[1].is_corresponding	False
authorships[2].author.id	https://openalex.org/A5020182193
authorships[2].author.orcid
authorships[2].author.display_name	Jade Copet
authorships[2].author_position	middle
authorships[2].raw_author_name	Copet, Jade
authorships[2].is_corresponding	False
authorships[3].author.id	https://openalex.org/A5120002517
authorships[3].author.orcid
authorships[3].author.display_name	Quentin Carbonneaux
authorships[3].author_position	middle
authorships[3].raw_author_name	Carbonneaux, Quentin
authorships[3].is_corresponding	False
authorships[4].author.id	https://openalex.org/A5043546718
authorships[4].author.orcid	https://orcid.org/0000-0001-5175-2702
authorships[4].author.display_name	Lingming Zhang
authorships[4].author_position	middle
authorships[4].raw_author_name	Zhang, Lingming
authorships[4].is_corresponding	False
authorships[5].author.id	https://openalex.org/A5003637850
authorships[5].author.orcid	https://orcid.org/0000-0002-5327-2558
authorships[5].author.display_name	Daniel Fried
authorships[5].author_position	middle
authorships[5].raw_author_name	Fried, Daniel
authorships[5].is_corresponding	False
authorships[6].author.id	https://openalex.org/A5016803317
authorships[6].author.orcid
authorships[6].author.display_name	Gabriel Synnaeve
authorships[6].author_position	middle
authorships[6].raw_author_name	Synnaeve, Gabriel
authorships[6].is_corresponding	False
authorships[7].author.id	https://openalex.org/A5101832593
authorships[7].author.orcid	https://orcid.org/0000-0002-8950-4277
authorships[7].author.display_name	Rishabh Singh
authorships[7].author_position	middle
authorships[7].raw_author_name	Singh, Rishabh
authorships[7].is_corresponding	False
authorships[8].author.id	https://openalex.org/A5084195779
authorships[8].author.orcid
authorships[8].author.display_name	Sida I. Wang
authorships[8].author_position	last
authorships[8].raw_author_name	Wang, Sida I.
authorships[8].is_corresponding	False
has_content.pdf	False
has_content.grobid_xml	False
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/2502.18449
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2025-10-15T00:00:00
display_name	SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
has_fulltext	False
is_retracted	False
updated_date	2025-11-06T06:51:31.235846
primary_topic.id	https://openalex.org/T10260
primary_topic.field.id	https://openalex.org/fields/17
primary_topic.field.display_name	Computer Science
primary_topic.score	0.9269000291824341
primary_topic.domain.id	https://openalex.org/domains/3
primary_topic.domain.display_name	Physical Sciences
primary_topic.subfield.id	https://openalex.org/subfields/1710
primary_topic.subfield.display_name	Information Systems
primary_topic.display_name	Software Engineering Research
cited_by_count	0
locations_count	2
best_oa_location.id	pmh:oai:arXiv.org:2502.18449
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license
best_oa_location.pdf_url	https://arxiv.org/pdf/2502.18449
best_oa_location.version	submittedVersion
best_oa_location.raw_type	text
best_oa_location.license_id
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/2502.18449
primary_location.id	pmh:oai:arXiv.org:2502.18449
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4306400194
primary_location.source.issn
primary_location.source.type	repository
primary_location.source.is_oa	True
primary_location.source.issn_l
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	arXiv (Cornell University)
primary_location.source.host_organization	https://openalex.org/I205783295
primary_location.source.host_organization_name	Cornell University
primary_location.source.host_organization_lineage	https://openalex.org/I205783295
primary_location.license
primary_location.pdf_url	https://arxiv.org/pdf/2502.18449
primary_location.version	submittedVersion
primary_location.raw_type	text
primary_location.license_id
primary_location.is_accepted	False
primary_location.is_published	False
primary_location.raw_source_name
primary_location.landing_page_url	http://arxiv.org/abs/2502.18449
publication_date	2025-02-25
publication_year	2025
referenced_works_count	0
abstract_inverted_index.a	58, 77, 95, 125, 133, 203, 217
abstract_inverted_index.--	91, 132
abstract_inverted_index.3,	118
abstract_inverted_index.RL	34, 166
abstract_inverted_index.To	140
abstract_inverted_index.as	108
abstract_inverted_index.by	83
abstract_inverted_index.in	13
abstract_inverted_index.is	144
abstract_inverted_index.it	182
abstract_inverted_index.of	9, 19, 94, 116, 136, 225
abstract_inverted_index.on	32, 114, 129, 168, 186, 211, 230
abstract_inverted_index.to	35, 48, 74, 153, 157, 208, 220
abstract_inverted_index.up	216
abstract_inverted_index.For	180
abstract_inverted_index.LLM	51
abstract_inverted_index.The	0
abstract_inverted_index.and	26, 38, 68, 81, 105, 110, 198
abstract_inverted_index.for	53, 149
abstract_inverted_index.has	4, 173
abstract_inverted_index.its	100
abstract_inverted_index.new	218
abstract_inverted_index.our	119, 141
abstract_inverted_index.the	6, 15, 45, 63, 92, 145, 222
abstract_inverted_index.top	115
abstract_inverted_index.(RL)	12
abstract_inverted_index.LLMs	73, 152, 160, 226
abstract_inverted_index.best	146
abstract_inverted_index.code	101, 103, 195
abstract_inverted_index.data	90
abstract_inverted_index.even	155, 174, 206
abstract_inverted_index.five	187
abstract_inverted_index.from	85
abstract_inverted_index.like	161
abstract_inverted_index.math	39
abstract_inverted_index.pull	111
abstract_inverted_index.rate	128
abstract_inverted_index.such	107
abstract_inverted_index.this	41, 143
abstract_inverted_index.use,	194
abstract_inverted_index.with	176
abstract_inverted_index.work	29
abstract_inverted_index.41.0%	126
abstract_inverted_index.Llama	117
abstract_inverted_index.While	24
abstract_inverted_index.data,	171
abstract_inverted_index.data.	234
abstract_inverted_index.date,	154
abstract_inverted_index.first	46
abstract_inverted_index.focus	31
abstract_inverted_index.large	20
abstract_inverted_index.leads	207
abstract_inverted_index.opens	215
abstract_inverted_index.other	27
abstract_inverted_index.paper	42
abstract_inverted_index.scale	49
abstract_inverted_index.score	65
abstract_inverted_index.shows	183
abstract_inverted_index.solve	127
abstract_inverted_index.(e.g.,	62
abstract_inverted_index.GitHub	138
abstract_inverted_index.SWE-RL	71, 214
abstract_inverted_index.coding	37
abstract_inverted_index.entire	97
abstract_inverted_index.events	106
abstract_inverted_index.issues	109
abstract_inverted_index.model,	122
abstract_inverted_index.models	22
abstract_inverted_index.recent	1
abstract_inverted_index.record	93
abstract_inverted_index.reward	61
abstract_inverted_index.solely	167
abstract_inverted_index.tasks,	189
abstract_inverted_index.(LLMs).	23
abstract_inverted_index.GPT-4o.	162
abstract_inverted_index.SWE-RL,	44
abstract_inverted_index.Trained	113
abstract_inverted_index.between	66
abstract_inverted_index.coding,	192
abstract_inverted_index.despite	164
abstract_inverted_index.emerged	175
abstract_inverted_index.enables	72
abstract_inverted_index.general	16, 199
abstract_inverted_index.immense	7
abstract_inverted_index.improve	221
abstract_inverted_index.issues.	139
abstract_inverted_index.leading	158
abstract_inverted_index.library	193
abstract_inverted_index.massive	231
abstract_inverted_index.namely,	190
abstract_inverted_index.recover	76
abstract_inverted_index.release	3
abstract_inverted_index.results	185
abstract_inverted_index.skills.	179
abstract_inverted_index.through	227
abstract_inverted_index.whereas	202
abstract_inverted_index.Overall,	213
abstract_inverted_index.RL-based	50
abstract_inverted_index.Verified	131
abstract_inverted_index.achieves	124
abstract_inverted_index.applying	33
abstract_inverted_index.approach	47
abstract_inverted_index.average.	212
abstract_inverted_index.baseline	205
abstract_inverted_index.changes,	104
abstract_inverted_index.example,	181
abstract_inverted_index.function	191
abstract_inverted_index.improved	184
abstract_inverted_index.language	21, 200
abstract_inverted_index.learning	11, 84, 229
abstract_inverted_index.reported	148
abstract_inverted_index.software	55, 88, 169, 232
abstract_inverted_index.SWE-bench	130
abstract_inverted_index.direction	219
abstract_inverted_index.enhancing	14
abstract_inverted_index.evolution	89, 170
abstract_inverted_index.extensive	86
abstract_inverted_index.follow-up	28
abstract_inverted_index.including	99
abstract_inverted_index.potential	8
abstract_inverted_index.primarily	30
abstract_inverted_index.problems,	40
abstract_inverted_index.processes	80
abstract_inverted_index.reasoning	17, 52, 79, 121, 178, 223
abstract_inverted_index.requests.	112
abstract_inverted_index.resulting	120
abstract_inverted_index.solutions	82
abstract_inverted_index.(<100B)	151
abstract_inverted_index.Leveraging	57
abstract_inverted_index.collection	135
abstract_inverted_index.comparable	156
abstract_inverted_index.introduces	43
abstract_inverted_index.knowledge,	142
abstract_inverted_index.lifecycle,	98
abstract_inverted_index.performing	165
abstract_inverted_index.real-world	54, 137
abstract_inverted_index.reasoning,	196
abstract_inverted_index.rule-based	60
abstract_inverted_index.similarity	64
abstract_inverted_index.snapshots,	102
abstract_inverted_index.software's	96
abstract_inverted_index.DeepSeek-R1	2, 25
abstract_inverted_index.competitive	36
abstract_inverted_index.degradation	210
abstract_inverted_index.developer's	78
abstract_inverted_index.engineering	233
abstract_inverted_index.generalized	177
abstract_inverted_index.lightweight	59
abstract_inverted_index.open-source	87
abstract_inverted_index.performance	147, 209
abstract_inverted_index.proprietary	159
abstract_inverted_index.solutions),	70
abstract_inverted_index.autonomously	75
abstract_inverted_index.capabilities	18, 224
abstract_inverted_index.demonstrated	5
abstract_inverted_index.engineering.	56
abstract_inverted_index.ground-truth	67
abstract_inverted_index.mathematics,	197
abstract_inverted_index.medium-sized	150
abstract_inverted_index.LLM-generated	69
abstract_inverted_index.Llama3-SWE-RL	172
abstract_inverted_index.Surprisingly,	163
abstract_inverted_index.out-of-domain	188
abstract_inverted_index.reinforcement	10, 228
abstract_inverted_index.human-verified	134
abstract_inverted_index.understanding,	201
abstract_inverted_index.Llama3-SWE-RL-70B,	123
abstract_inverted_index.supervised-finetuning	204
cited_by_percentile_year
countries_distinct_count	0
institutions_distinct_count	9
citation_normalized_percentile