AdaCuRL: Adaptive Curriculum Reinforcement Learning with Invalid Sample Mitigation and Historical Revisiting Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2511.09478
Reinforcement learning (RL) has demonstrated considerable potential for enhancing reasoning in large language models (LLMs). However, existing methods suffer from Gradient Starvation and Policy Degradation when training directly on samples with mixed difficulty. To mitigate this, prior approaches leverage Chain-of-Thought (CoT) data, but the construction of high-quality CoT annotations remains labor-intensive. Alternatively, curriculum learning strategies have been explored but frequently encounter challenges, such as difficulty mismatch, reliance on manual curriculum design, and catastrophic forgetting. To address these issues, we propose AdaCuRL, a Adaptive Curriculum Reinforcement Learning framework that integrates coarse-to-fine difficulty estimation with adaptive curriculum scheduling. This approach dynamically aligns data difficulty with model capability and incorporates a data revisitation mechanism to mitigate catastrophic forgetting. Furthermore, AdaCuRL employs adaptive reference and sparse KL strategies to prevent Policy Degradation. Extensive experiments across diverse reasoning benchmarks demonstrate that AdaCuRL consistently achieves significant performance improvements on both LLMs and MLLMs.
Related Topics
- Type
- preprint
- Landing Page
- http://arxiv.org/abs/2511.09478
- https://arxiv.org/pdf/2511.09478
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4416223658
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4416223658Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2511.09478Digital Object Identifier
- Title
-
AdaCuRL: Adaptive Curriculum Reinforcement Learning with Invalid Sample Mitigation and Historical RevisitingWork title
- Type
-
preprintOpenAlex work type
- Publication year
-
2025Year of publication
- Publication date
-
2025-11-12Full publication date if available
- Authors
-
Rui Li, Fei Wei, Yong Wang, Xiangxiang ChuList of authors in order
- Landing page
-
https://arxiv.org/abs/2511.09478Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2511.09478Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2511.09478Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4416223658 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2511.09478 |
| ids.doi | https://doi.org/10.48550/arxiv.2511.09478 |
| ids.openalex | https://openalex.org/W4416223658 |
| fwci | |
| type | preprint |
| title | AdaCuRL: Adaptive Curriculum Reinforcement Learning with Invalid Sample Mitigation and Historical Revisiting |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | |
| locations[0].id | pmh:oai:arXiv.org:2511.09478 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2511.09478 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2511.09478 |
| locations[1].id | doi:10.48550/arxiv.2511.09478 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2511.09478 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5108669748 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-9764-9954 |
| authorships[0].author.display_name | Rui Li |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Li, Renda |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5111728714 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Fei Wei |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Wei, Fei |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5049022174 |
| authorships[2].author.orcid | https://orcid.org/0000-0003-3336-5578 |
| authorships[2].author.display_name | Yong Wang |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Wang, Yong |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5101512474 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-2548-0605 |
| authorships[3].author.display_name | Xiangxiang Chu |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Chu, Xiangxiang |
| authorships[3].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2511.09478 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-11-14T00:00:00 |
| display_name | AdaCuRL: Adaptive Curriculum Reinforcement Learning with Invalid Sample Mitigation and Historical Revisiting |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-28T08:11:02.681770 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2511.09478 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2511.09478 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2511.09478 |
| primary_location.id | pmh:oai:arXiv.org:2511.09478 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2511.09478 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2511.09478 |
| publication_date | 2025-11-12 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 81, 107 |
| abstract_inverted_index.KL | 122 |
| abstract_inverted_index.To | 33, 74 |
| abstract_inverted_index.as | 63 |
| abstract_inverted_index.in | 10 |
| abstract_inverted_index.of | 45 |
| abstract_inverted_index.on | 28, 67, 142 |
| abstract_inverted_index.to | 111, 124 |
| abstract_inverted_index.we | 78 |
| abstract_inverted_index.CoT | 47 |
| abstract_inverted_index.and | 22, 71, 105, 120, 145 |
| abstract_inverted_index.but | 42, 58 |
| abstract_inverted_index.for | 7 |
| abstract_inverted_index.has | 3 |
| abstract_inverted_index.the | 43 |
| abstract_inverted_index.(RL) | 2 |
| abstract_inverted_index.LLMs | 144 |
| abstract_inverted_index.This | 96 |
| abstract_inverted_index.been | 56 |
| abstract_inverted_index.both | 143 |
| abstract_inverted_index.data | 100, 108 |
| abstract_inverted_index.from | 19 |
| abstract_inverted_index.have | 55 |
| abstract_inverted_index.such | 62 |
| abstract_inverted_index.that | 87, 135 |
| abstract_inverted_index.when | 25 |
| abstract_inverted_index.with | 30, 92, 102 |
| abstract_inverted_index.(CoT) | 40 |
| abstract_inverted_index.data, | 41 |
| abstract_inverted_index.large | 11 |
| abstract_inverted_index.mixed | 31 |
| abstract_inverted_index.model | 103 |
| abstract_inverted_index.prior | 36 |
| abstract_inverted_index.these | 76 |
| abstract_inverted_index.this, | 35 |
| abstract_inverted_index.MLLMs. | 146 |
| abstract_inverted_index.Policy | 23, 126 |
| abstract_inverted_index.across | 130 |
| abstract_inverted_index.aligns | 99 |
| abstract_inverted_index.manual | 68 |
| abstract_inverted_index.models | 13 |
| abstract_inverted_index.sparse | 121 |
| abstract_inverted_index.suffer | 18 |
| abstract_inverted_index.(LLMs). | 14 |
| abstract_inverted_index.AdaCuRL | 116, 136 |
| abstract_inverted_index.address | 75 |
| abstract_inverted_index.design, | 70 |
| abstract_inverted_index.diverse | 131 |
| abstract_inverted_index.employs | 117 |
| abstract_inverted_index.issues, | 77 |
| abstract_inverted_index.methods | 17 |
| abstract_inverted_index.prevent | 125 |
| abstract_inverted_index.propose | 79 |
| abstract_inverted_index.remains | 49 |
| abstract_inverted_index.samples | 29 |
| abstract_inverted_index.AdaCuRL, | 80 |
| abstract_inverted_index.Adaptive | 82 |
| abstract_inverted_index.Gradient | 20 |
| abstract_inverted_index.However, | 15 |
| abstract_inverted_index.Learning | 85 |
| abstract_inverted_index.achieves | 138 |
| abstract_inverted_index.adaptive | 93, 118 |
| abstract_inverted_index.approach | 97 |
| abstract_inverted_index.directly | 27 |
| abstract_inverted_index.existing | 16 |
| abstract_inverted_index.explored | 57 |
| abstract_inverted_index.language | 12 |
| abstract_inverted_index.learning | 1, 53 |
| abstract_inverted_index.leverage | 38 |
| abstract_inverted_index.mitigate | 34, 112 |
| abstract_inverted_index.reliance | 66 |
| abstract_inverted_index.training | 26 |
| abstract_inverted_index.Extensive | 128 |
| abstract_inverted_index.encounter | 60 |
| abstract_inverted_index.enhancing | 8 |
| abstract_inverted_index.framework | 86 |
| abstract_inverted_index.mechanism | 110 |
| abstract_inverted_index.mismatch, | 65 |
| abstract_inverted_index.potential | 6 |
| abstract_inverted_index.reasoning | 9, 132 |
| abstract_inverted_index.reference | 119 |
| abstract_inverted_index.Curriculum | 83 |
| abstract_inverted_index.Starvation | 21 |
| abstract_inverted_index.approaches | 37 |
| abstract_inverted_index.benchmarks | 133 |
| abstract_inverted_index.capability | 104 |
| abstract_inverted_index.curriculum | 52, 69, 94 |
| abstract_inverted_index.difficulty | 64, 90, 101 |
| abstract_inverted_index.estimation | 91 |
| abstract_inverted_index.frequently | 59 |
| abstract_inverted_index.integrates | 88 |
| abstract_inverted_index.strategies | 54, 123 |
| abstract_inverted_index.Degradation | 24 |
| abstract_inverted_index.annotations | 48 |
| abstract_inverted_index.challenges, | 61 |
| abstract_inverted_index.demonstrate | 134 |
| abstract_inverted_index.difficulty. | 32 |
| abstract_inverted_index.dynamically | 98 |
| abstract_inverted_index.experiments | 129 |
| abstract_inverted_index.forgetting. | 73, 114 |
| abstract_inverted_index.performance | 140 |
| abstract_inverted_index.scheduling. | 95 |
| abstract_inverted_index.significant | 139 |
| abstract_inverted_index.Degradation. | 127 |
| abstract_inverted_index.Furthermore, | 115 |
| abstract_inverted_index.catastrophic | 72, 113 |
| abstract_inverted_index.considerable | 5 |
| abstract_inverted_index.consistently | 137 |
| abstract_inverted_index.construction | 44 |
| abstract_inverted_index.demonstrated | 4 |
| abstract_inverted_index.high-quality | 46 |
| abstract_inverted_index.improvements | 141 |
| abstract_inverted_index.incorporates | 106 |
| abstract_inverted_index.revisitation | 109 |
| abstract_inverted_index.Reinforcement | 0, 84 |
| abstract_inverted_index.Alternatively, | 51 |
| abstract_inverted_index.coarse-to-fine | 89 |
| abstract_inverted_index.Chain-of-Thought | 39 |
| abstract_inverted_index.labor-intensive. | 50 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile |