Learning Robotic Policy with Imagined Transition: Mitigating the Trade-off between Robustness and Optimality Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2503.10484
Existing quadrupedal locomotion learning paradigms usually rely on extensive domain randomization to alleviate the sim2real gap and enhance robustness. It trains policies with a wide range of environment parameters and sensor noises to perform reliably under uncertainty. However, since optimal performance under ideal conditions often conflicts with the need to handle worst-case scenarios, there is a trade-off between optimality and robustness. This trade-off forces the learned policy to prioritize stability in diverse and challenging conditions over efficiency and accuracy in ideal ones, leading to overly conservative behaviors that sacrifice peak performance. In this paper, we propose a two-stage framework that mitigates this trade-off by integrating policy learning with imagined transitions. This framework enhances the conventional reinforcement learning (RL) approach by incorporating imagined transitions as demonstrative inputs. These imagined transitions are derived from an optimal policy and a dynamics model operating within an idealized setting. Our findings indicate that this approach significantly mitigates the domain randomization-induced negative impact of existing RL algorithms. It leads to accelerated training, reduced tracking errors within the distribution, and enhanced robustness outside the distribution.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2503.10484
- https://arxiv.org/pdf/2503.10484
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4415249754
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4415249754Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2503.10484Digital Object Identifier
- Title
-
Learning Robotic Policy with Imagined Transition: Mitigating the Trade-off between Robustness and OptimalityWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-03-13Full publication date if available
- Authors
-
Xiao Wei, Shangke Lyu, Zheng Gong, Renjie Wang, Donglin WangList of authors in order
- Landing page
-
https://arxiv.org/abs/2503.10484Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2503.10484Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2503.10484Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4415249754 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2503.10484 |
| ids.doi | https://doi.org/10.48550/arxiv.2503.10484 |
| ids.openalex | https://openalex.org/W4415249754 |
| fwci | 0.0 |
| type | preprint |
| title | Learning Robotic Policy with Imagined Transition: Mitigating the Trade-off between Robustness and Optimality |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11182 |
| topics[0].field.id | https://openalex.org/fields/18 |
| topics[0].field.display_name | Decision Sciences |
| topics[0].score | 0.7831000089645386 |
| topics[0].domain.id | https://openalex.org/domains/2 |
| topics[0].domain.display_name | Social Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1803 |
| topics[0].subfield.display_name | Management Science and Operations Research |
| topics[0].display_name | Auction Theory and Applications |
| topics[1].id | https://openalex.org/T10462 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.7321000099182129 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Reinforcement Learning in Robotics |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2503.10484 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2503.10484 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2503.10484 |
| locations[1].id | doi:10.48550/arxiv.2503.10484 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2503.10484 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5100446396 |
| authorships[0].author.orcid | https://orcid.org/0009-0003-2623-2555 |
| authorships[0].author.display_name | Xiao Wei |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Xiao, Wei |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5075248381 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-8302-6630 |
| authorships[1].author.display_name | Shangke Lyu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Lyu, Shangke |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5057884582 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-6651-5582 |
| authorships[2].author.display_name | Zheng Gong |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Gong, Zhefei |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5062676322 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-6274-3921 |
| authorships[3].author.display_name | Renjie Wang |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Wang, Renjie |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5100665181 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-8188-3735 |
| authorships[4].author.display_name | Donglin Wang |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Wang, Donglin |
| authorships[4].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2503.10484 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-16T00:00:00 |
| display_name | Learning Robotic Policy with Imagined Transition: Mitigating the Trade-off between Robustness and Optimality |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11182 |
| primary_topic.field.id | https://openalex.org/fields/18 |
| primary_topic.field.display_name | Decision Sciences |
| primary_topic.score | 0.7831000089645386 |
| primary_topic.domain.id | https://openalex.org/domains/2 |
| primary_topic.domain.display_name | Social Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1803 |
| primary_topic.subfield.display_name | Management Science and Operations Research |
| primary_topic.display_name | Auction Theory and Applications |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2503.10484 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2503.10484 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2503.10484 |
| primary_location.id | pmh:oai:arXiv.org:2503.10484 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2503.10484 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2503.10484 |
| publication_date | 2025-03-13 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 23, 55, 96, 136 |
| abstract_inverted_index.In | 91 |
| abstract_inverted_index.It | 19, 161 |
| abstract_inverted_index.RL | 159 |
| abstract_inverted_index.an | 132, 141 |
| abstract_inverted_index.as | 123 |
| abstract_inverted_index.by | 103, 119 |
| abstract_inverted_index.in | 70, 79 |
| abstract_inverted_index.is | 54 |
| abstract_inverted_index.of | 26, 157 |
| abstract_inverted_index.on | 7 |
| abstract_inverted_index.to | 11, 32, 49, 67, 83, 163 |
| abstract_inverted_index.we | 94 |
| abstract_inverted_index.Our | 144 |
| abstract_inverted_index.and | 16, 29, 59, 72, 77, 135, 172 |
| abstract_inverted_index.are | 129 |
| abstract_inverted_index.gap | 15 |
| abstract_inverted_index.the | 13, 47, 64, 113, 152, 170, 176 |
| abstract_inverted_index.(RL) | 117 |
| abstract_inverted_index.This | 61, 110 |
| abstract_inverted_index.from | 131 |
| abstract_inverted_index.need | 48 |
| abstract_inverted_index.over | 75 |
| abstract_inverted_index.peak | 89 |
| abstract_inverted_index.rely | 6 |
| abstract_inverted_index.that | 87, 99, 147 |
| abstract_inverted_index.this | 92, 101, 148 |
| abstract_inverted_index.wide | 24 |
| abstract_inverted_index.with | 22, 46, 107 |
| abstract_inverted_index.These | 126 |
| abstract_inverted_index.ideal | 42, 80 |
| abstract_inverted_index.leads | 162 |
| abstract_inverted_index.model | 138 |
| abstract_inverted_index.often | 44 |
| abstract_inverted_index.ones, | 81 |
| abstract_inverted_index.range | 25 |
| abstract_inverted_index.since | 38 |
| abstract_inverted_index.there | 53 |
| abstract_inverted_index.under | 35, 41 |
| abstract_inverted_index.domain | 9, 153 |
| abstract_inverted_index.errors | 168 |
| abstract_inverted_index.forces | 63 |
| abstract_inverted_index.handle | 50 |
| abstract_inverted_index.impact | 156 |
| abstract_inverted_index.noises | 31 |
| abstract_inverted_index.overly | 84 |
| abstract_inverted_index.paper, | 93 |
| abstract_inverted_index.policy | 66, 105, 134 |
| abstract_inverted_index.sensor | 30 |
| abstract_inverted_index.trains | 20 |
| abstract_inverted_index.within | 140, 169 |
| abstract_inverted_index.between | 57 |
| abstract_inverted_index.derived | 130 |
| abstract_inverted_index.diverse | 71 |
| abstract_inverted_index.enhance | 17 |
| abstract_inverted_index.inputs. | 125 |
| abstract_inverted_index.leading | 82 |
| abstract_inverted_index.learned | 65 |
| abstract_inverted_index.optimal | 39, 133 |
| abstract_inverted_index.outside | 175 |
| abstract_inverted_index.perform | 33 |
| abstract_inverted_index.propose | 95 |
| abstract_inverted_index.reduced | 166 |
| abstract_inverted_index.usually | 5 |
| abstract_inverted_index.Existing | 0 |
| abstract_inverted_index.However, | 37 |
| abstract_inverted_index.accuracy | 78 |
| abstract_inverted_index.approach | 118, 149 |
| abstract_inverted_index.dynamics | 137 |
| abstract_inverted_index.enhanced | 173 |
| abstract_inverted_index.enhances | 112 |
| abstract_inverted_index.existing | 158 |
| abstract_inverted_index.findings | 145 |
| abstract_inverted_index.imagined | 108, 121, 127 |
| abstract_inverted_index.indicate | 146 |
| abstract_inverted_index.learning | 3, 106, 116 |
| abstract_inverted_index.negative | 155 |
| abstract_inverted_index.policies | 21 |
| abstract_inverted_index.reliably | 34 |
| abstract_inverted_index.setting. | 143 |
| abstract_inverted_index.sim2real | 14 |
| abstract_inverted_index.tracking | 167 |
| abstract_inverted_index.alleviate | 12 |
| abstract_inverted_index.behaviors | 86 |
| abstract_inverted_index.conflicts | 45 |
| abstract_inverted_index.extensive | 8 |
| abstract_inverted_index.framework | 98, 111 |
| abstract_inverted_index.idealized | 142 |
| abstract_inverted_index.mitigates | 100, 151 |
| abstract_inverted_index.operating | 139 |
| abstract_inverted_index.paradigms | 4 |
| abstract_inverted_index.sacrifice | 88 |
| abstract_inverted_index.stability | 69 |
| abstract_inverted_index.trade-off | 56, 62, 102 |
| abstract_inverted_index.training, | 165 |
| abstract_inverted_index.two-stage | 97 |
| abstract_inverted_index.conditions | 43, 74 |
| abstract_inverted_index.efficiency | 76 |
| abstract_inverted_index.locomotion | 2 |
| abstract_inverted_index.optimality | 58 |
| abstract_inverted_index.parameters | 28 |
| abstract_inverted_index.prioritize | 68 |
| abstract_inverted_index.robustness | 174 |
| abstract_inverted_index.scenarios, | 52 |
| abstract_inverted_index.worst-case | 51 |
| abstract_inverted_index.accelerated | 164 |
| abstract_inverted_index.algorithms. | 160 |
| abstract_inverted_index.challenging | 73 |
| abstract_inverted_index.environment | 27 |
| abstract_inverted_index.integrating | 104 |
| abstract_inverted_index.performance | 40 |
| abstract_inverted_index.quadrupedal | 1 |
| abstract_inverted_index.robustness. | 18, 60 |
| abstract_inverted_index.transitions | 122, 128 |
| abstract_inverted_index.conservative | 85 |
| abstract_inverted_index.conventional | 114 |
| abstract_inverted_index.performance. | 90 |
| abstract_inverted_index.transitions. | 109 |
| abstract_inverted_index.uncertainty. | 36 |
| abstract_inverted_index.demonstrative | 124 |
| abstract_inverted_index.distribution, | 171 |
| abstract_inverted_index.distribution. | 177 |
| abstract_inverted_index.incorporating | 120 |
| abstract_inverted_index.randomization | 10 |
| abstract_inverted_index.reinforcement | 115 |
| abstract_inverted_index.significantly | 150 |
| abstract_inverted_index.randomization-induced | 154 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile |