Writing-RL: Advancing Long-form Writing via Adaptive Curriculum Reinforcement Learning Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2506.05760
Recent advances in Large Language Models (LLMs) have enabled strong performance in long-form writing, yet existing supervised fine-tuning (SFT) approaches suffer from limitations such as data saturation and restricted learning capacity bounded by teacher signals. In this work, we present Writing-RL: an Adaptive Curriculum Reinforcement Learning framework to advance long-form writing capabilities beyond SFT. The framework consists of three key components: Margin-aware Data Selection strategy that prioritizes samples with high learning potential, Pairwise Comparison Reward mechanism that provides discriminative learning signals in the absence of verifiable rewards, and Dynamic Reference Scheduling approach, which plays a particularly critical role by adaptively adjusting task difficulty based on evolving model performance. Experiments on 7B-scale writer models show that our RL framework largely improves long-form writing performance over strong SFT baselines. Furthermore, we observe that models trained with long-output RL generalize surprisingly well to long-input reasoning tasks, potentially offering a promising perspective for rethinking long-context training.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2506.05760
- https://arxiv.org/pdf/2506.05760
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4417097313
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4417097313Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2506.05760Digital Object Identifier
- Title
-
Writing-RL: Advancing Long-form Writing via Adaptive Curriculum Reinforcement LearningWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-06-06Full publication date if available
- Authors
-
Xuanyu Lei, Yuning Wu, Kaiming Liu, Weizhou Shen, Ming Yan, Zhang Ji, Yang LiuList of authors in order
- Landing page
-
https://arxiv.org/abs/2506.05760Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2506.05760Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2506.05760Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4417097313 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2506.05760 |
| ids.doi | https://doi.org/10.48550/arxiv.2506.05760 |
| ids.openalex | https://openalex.org/W4417097313 |
| fwci | |
| type | preprint |
| title | Writing-RL: Advancing Long-form Writing via Adaptive Curriculum Reinforcement Learning |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2506.05760 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2506.05760 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2506.05760 |
| locations[1].id | doi:10.48550/arxiv.2506.05760 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2506.05760 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5102586784 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Xuanyu Lei |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Lei, Xuanyu |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5040175121 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-2699-4790 |
| authorships[1].author.display_name | Yuning Wu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Wu, Yuning |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5084824459 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-6006-3239 |
| authorships[2].author.display_name | Kaiming Liu |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Liu, Kaiming |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5024669409 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-9180-0043 |
| authorships[3].author.display_name | Weizhou Shen |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Shen, Weizhou |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5064805061 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-3772-5238 |
| authorships[4].author.display_name | Ming Yan |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Yan, Ming |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5100705331 |
| authorships[5].author.orcid | https://orcid.org/0000-0003-2161-3431 |
| authorships[5].author.display_name | Zhang Ji |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Zhang, Ji |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5101203810 |
| authorships[6].author.orcid | https://orcid.org/0009-0004-7163-5086 |
| authorships[6].author.display_name | Yang Liu |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Liu, Yang |
| authorships[6].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2506.05760 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Writing-RL: Advancing Long-form Writing via Adaptive Curriculum Reinforcement Learning |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-12-07T21:05:23.491565 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2506.05760 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2506.05760 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2506.05760 |
| primary_location.id | pmh:oai:arXiv.org:2506.05760 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2506.05760 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2506.05760 |
| publication_date | 2025-06-06 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 94, 145 |
| abstract_inverted_index.In | 35 |
| abstract_inverted_index.RL | 116, 135 |
| abstract_inverted_index.an | 41 |
| abstract_inverted_index.as | 24 |
| abstract_inverted_index.by | 32, 98 |
| abstract_inverted_index.in | 2, 11, 81 |
| abstract_inverted_index.of | 57, 84 |
| abstract_inverted_index.on | 104, 109 |
| abstract_inverted_index.to | 47, 139 |
| abstract_inverted_index.we | 38, 128 |
| abstract_inverted_index.SFT | 125 |
| abstract_inverted_index.The | 54 |
| abstract_inverted_index.and | 27, 87 |
| abstract_inverted_index.for | 148 |
| abstract_inverted_index.key | 59 |
| abstract_inverted_index.our | 115 |
| abstract_inverted_index.the | 82 |
| abstract_inverted_index.yet | 14 |
| abstract_inverted_index.Data | 62 |
| abstract_inverted_index.SFT. | 53 |
| abstract_inverted_index.data | 25 |
| abstract_inverted_index.from | 21 |
| abstract_inverted_index.have | 7 |
| abstract_inverted_index.high | 69 |
| abstract_inverted_index.over | 123 |
| abstract_inverted_index.role | 97 |
| abstract_inverted_index.show | 113 |
| abstract_inverted_index.such | 23 |
| abstract_inverted_index.task | 101 |
| abstract_inverted_index.that | 65, 76, 114, 130 |
| abstract_inverted_index.this | 36 |
| abstract_inverted_index.well | 138 |
| abstract_inverted_index.with | 68, 133 |
| abstract_inverted_index.(SFT) | 18 |
| abstract_inverted_index.Large | 3 |
| abstract_inverted_index.based | 103 |
| abstract_inverted_index.model | 106 |
| abstract_inverted_index.plays | 93 |
| abstract_inverted_index.three | 58 |
| abstract_inverted_index.which | 92 |
| abstract_inverted_index.work, | 37 |
| abstract_inverted_index.(LLMs) | 6 |
| abstract_inverted_index.Models | 5 |
| abstract_inverted_index.Recent | 0 |
| abstract_inverted_index.Reward | 74 |
| abstract_inverted_index.beyond | 52 |
| abstract_inverted_index.models | 112, 131 |
| abstract_inverted_index.strong | 9, 124 |
| abstract_inverted_index.suffer | 20 |
| abstract_inverted_index.tasks, | 142 |
| abstract_inverted_index.writer | 111 |
| abstract_inverted_index.Dynamic | 88 |
| abstract_inverted_index.absence | 83 |
| abstract_inverted_index.advance | 48 |
| abstract_inverted_index.bounded | 31 |
| abstract_inverted_index.enabled | 8 |
| abstract_inverted_index.largely | 118 |
| abstract_inverted_index.observe | 129 |
| abstract_inverted_index.present | 39 |
| abstract_inverted_index.samples | 67 |
| abstract_inverted_index.signals | 80 |
| abstract_inverted_index.teacher | 33 |
| abstract_inverted_index.trained | 132 |
| abstract_inverted_index.writing | 50, 121 |
| abstract_inverted_index.7B-scale | 110 |
| abstract_inverted_index.Adaptive | 42 |
| abstract_inverted_index.Language | 4 |
| abstract_inverted_index.Learning | 45 |
| abstract_inverted_index.Pairwise | 72 |
| abstract_inverted_index.advances | 1 |
| abstract_inverted_index.capacity | 30 |
| abstract_inverted_index.consists | 56 |
| abstract_inverted_index.critical | 96 |
| abstract_inverted_index.evolving | 105 |
| abstract_inverted_index.existing | 15 |
| abstract_inverted_index.improves | 119 |
| abstract_inverted_index.learning | 29, 70, 79 |
| abstract_inverted_index.offering | 144 |
| abstract_inverted_index.provides | 77 |
| abstract_inverted_index.rewards, | 86 |
| abstract_inverted_index.signals. | 34 |
| abstract_inverted_index.strategy | 64 |
| abstract_inverted_index.writing, | 13 |
| abstract_inverted_index.Reference | 89 |
| abstract_inverted_index.Selection | 63 |
| abstract_inverted_index.adjusting | 100 |
| abstract_inverted_index.approach, | 91 |
| abstract_inverted_index.framework | 46, 55, 117 |
| abstract_inverted_index.long-form | 12, 49, 120 |
| abstract_inverted_index.mechanism | 75 |
| abstract_inverted_index.promising | 146 |
| abstract_inverted_index.reasoning | 141 |
| abstract_inverted_index.training. | 151 |
| abstract_inverted_index.Comparison | 73 |
| abstract_inverted_index.Curriculum | 43 |
| abstract_inverted_index.Scheduling | 90 |
| abstract_inverted_index.adaptively | 99 |
| abstract_inverted_index.approaches | 19 |
| abstract_inverted_index.baselines. | 126 |
| abstract_inverted_index.difficulty | 102 |
| abstract_inverted_index.generalize | 136 |
| abstract_inverted_index.long-input | 140 |
| abstract_inverted_index.potential, | 71 |
| abstract_inverted_index.restricted | 28 |
| abstract_inverted_index.rethinking | 149 |
| abstract_inverted_index.saturation | 26 |
| abstract_inverted_index.supervised | 16 |
| abstract_inverted_index.verifiable | 85 |
| abstract_inverted_index.Experiments | 108 |
| abstract_inverted_index.Writing-RL: | 40 |
| abstract_inverted_index.components: | 60 |
| abstract_inverted_index.fine-tuning | 17 |
| abstract_inverted_index.limitations | 22 |
| abstract_inverted_index.long-output | 134 |
| abstract_inverted_index.performance | 10, 122 |
| abstract_inverted_index.perspective | 147 |
| abstract_inverted_index.potentially | 143 |
| abstract_inverted_index.prioritizes | 66 |
| abstract_inverted_index.Furthermore, | 127 |
| abstract_inverted_index.Margin-aware | 61 |
| abstract_inverted_index.capabilities | 51 |
| abstract_inverted_index.long-context | 150 |
| abstract_inverted_index.particularly | 95 |
| abstract_inverted_index.performance. | 107 |
| abstract_inverted_index.surprisingly | 137 |
| abstract_inverted_index.Reinforcement | 44 |
| abstract_inverted_index.discriminative | 78 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 7 |
| citation_normalized_percentile |