Optimistic Model Rollouts for Pessimistic Offline Policy Optimization Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.1609/aaai.v38i15.29607
Model-based offline reinforcement learning (RL) has made remarkable progress, offering a promising avenue for improving generalization with synthetic model rollouts. Existing works primarily focus on incorporating pessimism for policy optimization, usually via constructing a Pessimistic Markov Decision Process (P-MDP). However, the P-MDP discourages the policies from learning in out-of-distribution (OOD) regions beyond the support of offline datasets, which can under-utilize the generalization ability of dynamics models. In contrast, we propose constructing an Optimistic MDP (O-MDP). We initially observed the potential benefits of optimism brought by encouraging more OOD rollouts. Motivated by this observation, we present ORPO, a simple yet effective model-based offline RL framework. ORPO generates Optimistic model Rollouts for Pessimistic offline policy Optimization. Specifically, we train an optimistic rollout policy in the O-MDP to sample more OOD model rollouts. Then we relabel the sampled state-action pairs with penalized rewards, and optimize the output policy in the P-MDP. Theoretically, we demonstrate that the performance of policies trained with ORPO can be lower-bounded in linear MDPs. Experimental results show that our framework significantly outperforms P-MDP baselines by a margin of 30%, achieving state-of-the-art performance on the widely-used benchmark. Moreover, ORPO exhibits notable advantages in problems that require generalization.
Related Topics
- Type
- article
- Language
- en
- Landing Page
- https://doi.org/10.1609/aaai.v38i15.29607
- https://ojs.aaai.org/index.php/AAAI/article/download/29607/31026
- OA Status
- diamond
- Cited By
- 1
- References
- 51
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4393156710
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4393156710Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.1609/aaai.v38i15.29607Digital Object Identifier
- Title
-
Optimistic Model Rollouts for Pessimistic Offline Policy OptimizationWork title
- Type
-
articleOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-03-24Full publication date if available
- Authors
-
Yuanzhao Zhai, Yiying Li, Zijian Gao, Xudong Gong, Kele Xu, Dawei Feng, Bo Ding, Huaimin WangList of authors in order
- Landing page
-
https://doi.org/10.1609/aaai.v38i15.29607Publisher landing page
- PDF URL
-
https://ojs.aaai.org/index.php/AAAI/article/download/29607/31026Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
diamondOpen access status per OpenAlex
- OA URL
-
https://ojs.aaai.org/index.php/AAAI/article/download/29607/31026Direct OA link when available
- Concepts
-
Pessimism, Computer science, Epistemology, PhilosophyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2024: 1Per-year citation counts (last 5 years)
- References (count)
-
51Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4393156710 |
|---|---|
| doi | https://doi.org/10.1609/aaai.v38i15.29607 |
| ids.doi | https://doi.org/10.1609/aaai.v38i15.29607 |
| ids.openalex | https://openalex.org/W4393156710 |
| fwci | 1.07475083 |
| type | article |
| title | Optimistic Model Rollouts for Pessimistic Offline Policy Optimization |
| biblio.issue | 15 |
| biblio.volume | 38 |
| biblio.last_page | 16686 |
| biblio.first_page | 16678 |
| topics[0].id | https://openalex.org/T11801 |
| topics[0].field.id | https://openalex.org/fields/22 |
| topics[0].field.display_name | Engineering |
| topics[0].score | 0.8931000232696533 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/2212 |
| topics[0].subfield.display_name | Ocean Engineering |
| topics[0].display_name | Reservoir Engineering and Simulation Methods |
| topics[1].id | https://openalex.org/T12535 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.8676999807357788 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Machine Learning and Data Classification |
| topics[2].id | https://openalex.org/T12072 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.7883999943733215 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Machine Learning and Algorithms |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C9992130 |
| concepts[0].level | 2 |
| concepts[0].score | 0.6689664125442505 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q484954 |
| concepts[0].display_name | Pessimism |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.4836353063583374 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C111472728 |
| concepts[2].level | 1 |
| concepts[2].score | 0.0 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q9471 |
| concepts[2].display_name | Epistemology |
| concepts[3].id | https://openalex.org/C138885662 |
| concepts[3].level | 0 |
| concepts[3].score | 0.0 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q5891 |
| concepts[3].display_name | Philosophy |
| keywords[0].id | https://openalex.org/keywords/pessimism |
| keywords[0].score | 0.6689664125442505 |
| keywords[0].display_name | Pessimism |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.4836353063583374 |
| keywords[1].display_name | Computer science |
| language | en |
| locations[0].id | doi:10.1609/aaai.v38i15.29607 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4210191458 |
| locations[0].source.issn | 2159-5399, 2374-3468 |
| locations[0].source.type | conference |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | 2159-5399 |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| locations[0].source.host_organization | https://openalex.org/P4310320058 |
| locations[0].source.host_organization_name | Association for the Advancement of Artificial Intelligence |
| locations[0].source.host_organization_lineage | https://openalex.org/P4310320058 |
| locations[0].source.host_organization_lineage_names | Association for the Advancement of Artificial Intelligence |
| locations[0].license | |
| locations[0].pdf_url | https://ojs.aaai.org/index.php/AAAI/article/download/29607/31026 |
| locations[0].version | publishedVersion |
| locations[0].raw_type | journal-article |
| locations[0].license_id | |
| locations[0].is_accepted | True |
| locations[0].is_published | True |
| locations[0].raw_source_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| locations[0].landing_page_url | https://doi.org/10.1609/aaai.v38i15.29607 |
| indexed_in | crossref |
| authorships[0].author.id | https://openalex.org/A5073132517 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-1385-0074 |
| authorships[0].author.display_name | Yuanzhao Zhai |
| authorships[0].countries | CN |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I170215575 |
| authorships[0].affiliations[0].raw_affiliation_string | National University of Defense Technology, Changsha, China State Key Laboratory of Complex & Critical Software Environment, Changsha, China |
| authorships[0].institutions[0].id | https://openalex.org/I170215575 |
| authorships[0].institutions[0].ror | https://ror.org/05d2yfz11 |
| authorships[0].institutions[0].type | education |
| authorships[0].institutions[0].lineage | https://openalex.org/I170215575 |
| authorships[0].institutions[0].country_code | CN |
| authorships[0].institutions[0].display_name | National University of Defense Technology |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Yuanzhao Zhai |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | National University of Defense Technology, Changsha, China State Key Laboratory of Complex & Critical Software Environment, Changsha, China |
| authorships[1].author.id | https://openalex.org/A5102770021 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-2632-5175 |
| authorships[1].author.display_name | Yiying Li |
| authorships[1].countries | CN |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I4210100255 |
| authorships[1].affiliations[0].raw_affiliation_string | Artificial Intelligence Research Center, DII, Beijing, China |
| authorships[1].institutions[0].id | https://openalex.org/I4210100255 |
| authorships[1].institutions[0].ror | https://ror.org/016a74861 |
| authorships[1].institutions[0].type | other |
| authorships[1].institutions[0].lineage | https://openalex.org/I4210100255 |
| authorships[1].institutions[0].country_code | CN |
| authorships[1].institutions[0].display_name | Beijing Academy of Artificial Intelligence |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Yiying Li |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | Artificial Intelligence Research Center, DII, Beijing, China |
| authorships[2].author.id | https://openalex.org/A5016262505 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-5151-3381 |
| authorships[2].author.display_name | Zijian Gao |
| authorships[2].countries | CN |
| authorships[2].affiliations[0].institution_ids | https://openalex.org/I170215575 |
| authorships[2].affiliations[0].raw_affiliation_string | National University of Defense Technology, Changsha, China State Key Laboratory of Complex & Critical Software Environment, Changsha, China |
| authorships[2].institutions[0].id | https://openalex.org/I170215575 |
| authorships[2].institutions[0].ror | https://ror.org/05d2yfz11 |
| authorships[2].institutions[0].type | education |
| authorships[2].institutions[0].lineage | https://openalex.org/I170215575 |
| authorships[2].institutions[0].country_code | CN |
| authorships[2].institutions[0].display_name | National University of Defense Technology |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Zijian Gao |
| authorships[2].is_corresponding | False |
| authorships[2].raw_affiliation_strings | National University of Defense Technology, Changsha, China State Key Laboratory of Complex & Critical Software Environment, Changsha, China |
| authorships[3].author.id | https://openalex.org/A5066103977 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-2253-2927 |
| authorships[3].author.display_name | Xudong Gong |
| authorships[3].countries | CN |
| authorships[3].affiliations[0].institution_ids | https://openalex.org/I170215575 |
| authorships[3].affiliations[0].raw_affiliation_string | National University of Defense Technology, Changsha, China State Key Laboratory of Complex & Critical Software Environment, Changsha, China |
| authorships[3].institutions[0].id | https://openalex.org/I170215575 |
| authorships[3].institutions[0].ror | https://ror.org/05d2yfz11 |
| authorships[3].institutions[0].type | education |
| authorships[3].institutions[0].lineage | https://openalex.org/I170215575 |
| authorships[3].institutions[0].country_code | CN |
| authorships[3].institutions[0].display_name | National University of Defense Technology |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Xudong Gong |
| authorships[3].is_corresponding | False |
| authorships[3].raw_affiliation_strings | National University of Defense Technology, Changsha, China State Key Laboratory of Complex & Critical Software Environment, Changsha, China |
| authorships[4].author.id | https://openalex.org/A5013340793 |
| authorships[4].author.orcid | https://orcid.org/0000-0001-5997-5169 |
| authorships[4].author.display_name | Kele Xu |
| authorships[4].countries | CN |
| authorships[4].affiliations[0].institution_ids | https://openalex.org/I170215575 |
| authorships[4].affiliations[0].raw_affiliation_string | National University of Defense Technology, Changsha, China State Key Laboratory of Complex & Critical Software Environment, Changsha, China |
| authorships[4].institutions[0].id | https://openalex.org/I170215575 |
| authorships[4].institutions[0].ror | https://ror.org/05d2yfz11 |
| authorships[4].institutions[0].type | education |
| authorships[4].institutions[0].lineage | https://openalex.org/I170215575 |
| authorships[4].institutions[0].country_code | CN |
| authorships[4].institutions[0].display_name | National University of Defense Technology |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Kele Xu |
| authorships[4].is_corresponding | False |
| authorships[4].raw_affiliation_strings | National University of Defense Technology, Changsha, China State Key Laboratory of Complex & Critical Software Environment, Changsha, China |
| authorships[5].author.id | https://openalex.org/A5039795290 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-7587-8905 |
| authorships[5].author.display_name | Dawei Feng |
| authorships[5].countries | CN |
| authorships[5].affiliations[0].institution_ids | https://openalex.org/I170215575 |
| authorships[5].affiliations[0].raw_affiliation_string | National University of Defense Technology, Changsha, China State Key Laboratory of Complex & Critical Software Environment, Changsha, China |
| authorships[5].institutions[0].id | https://openalex.org/I170215575 |
| authorships[5].institutions[0].ror | https://ror.org/05d2yfz11 |
| authorships[5].institutions[0].type | education |
| authorships[5].institutions[0].lineage | https://openalex.org/I170215575 |
| authorships[5].institutions[0].country_code | CN |
| authorships[5].institutions[0].display_name | National University of Defense Technology |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Dawei Feng |
| authorships[5].is_corresponding | False |
| authorships[5].raw_affiliation_strings | National University of Defense Technology, Changsha, China State Key Laboratory of Complex & Critical Software Environment, Changsha, China |
| authorships[6].author.id | https://openalex.org/A5088885490 |
| authorships[6].author.orcid | https://orcid.org/0000-0002-1236-8318 |
| authorships[6].author.display_name | Bo Ding |
| authorships[6].countries | CN |
| authorships[6].affiliations[0].institution_ids | https://openalex.org/I170215575 |
| authorships[6].affiliations[0].raw_affiliation_string | National University of Defense Technology, Changsha, China State Key Laboratory of Complex & Critical Software Environment, Changsha, China |
| authorships[6].institutions[0].id | https://openalex.org/I170215575 |
| authorships[6].institutions[0].ror | https://ror.org/05d2yfz11 |
| authorships[6].institutions[0].type | education |
| authorships[6].institutions[0].lineage | https://openalex.org/I170215575 |
| authorships[6].institutions[0].country_code | CN |
| authorships[6].institutions[0].display_name | National University of Defense Technology |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Ding Bo |
| authorships[6].is_corresponding | False |
| authorships[6].raw_affiliation_strings | National University of Defense Technology, Changsha, China State Key Laboratory of Complex & Critical Software Environment, Changsha, China |
| authorships[7].author.id | https://openalex.org/A5101522100 |
| authorships[7].author.orcid | https://orcid.org/0000-0002-3245-1901 |
| authorships[7].author.display_name | Huaimin Wang |
| authorships[7].countries | CN |
| authorships[7].affiliations[0].institution_ids | https://openalex.org/I170215575 |
| authorships[7].affiliations[0].raw_affiliation_string | National University of Defense Technology, Changsha, China State Key Laboratory of Complex & Critical Software Environment, Changsha, China |
| authorships[7].institutions[0].id | https://openalex.org/I170215575 |
| authorships[7].institutions[0].ror | https://ror.org/05d2yfz11 |
| authorships[7].institutions[0].type | education |
| authorships[7].institutions[0].lineage | https://openalex.org/I170215575 |
| authorships[7].institutions[0].country_code | CN |
| authorships[7].institutions[0].display_name | National University of Defense Technology |
| authorships[7].author_position | last |
| authorships[7].raw_author_name | Huaimin Wang |
| authorships[7].is_corresponding | False |
| authorships[7].raw_affiliation_strings | National University of Defense Technology, Changsha, China State Key Laboratory of Complex & Critical Software Environment, Changsha, China |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://ojs.aaai.org/index.php/AAAI/article/download/29607/31026 |
| open_access.oa_status | diamond |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Optimistic Model Rollouts for Pessimistic Offline Policy Optimization |
| has_fulltext | True |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T11801 |
| primary_topic.field.id | https://openalex.org/fields/22 |
| primary_topic.field.display_name | Engineering |
| primary_topic.score | 0.8931000232696533 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/2212 |
| primary_topic.subfield.display_name | Ocean Engineering |
| primary_topic.display_name | Reservoir Engineering and Simulation Methods |
| related_works | https://openalex.org/W2748952813, https://openalex.org/W4380987628, https://openalex.org/W2418537576, https://openalex.org/W2557514562, https://openalex.org/W214945085, https://openalex.org/W1987935396, https://openalex.org/W3126025002, https://openalex.org/W2354456418, https://openalex.org/W3197683035, https://openalex.org/W2006807542 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2024 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 1 |
| best_oa_location.id | doi:10.1609/aaai.v38i15.29607 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4210191458 |
| best_oa_location.source.issn | 2159-5399, 2374-3468 |
| best_oa_location.source.type | conference |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | 2159-5399 |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| best_oa_location.source.host_organization | https://openalex.org/P4310320058 |
| best_oa_location.source.host_organization_name | Association for the Advancement of Artificial Intelligence |
| best_oa_location.source.host_organization_lineage | https://openalex.org/P4310320058 |
| best_oa_location.source.host_organization_lineage_names | Association for the Advancement of Artificial Intelligence |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://ojs.aaai.org/index.php/AAAI/article/download/29607/31026 |
| best_oa_location.version | publishedVersion |
| best_oa_location.raw_type | journal-article |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | True |
| best_oa_location.raw_source_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| best_oa_location.landing_page_url | https://doi.org/10.1609/aaai.v38i15.29607 |
| primary_location.id | doi:10.1609/aaai.v38i15.29607 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4210191458 |
| primary_location.source.issn | 2159-5399, 2374-3468 |
| primary_location.source.type | conference |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | 2159-5399 |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| primary_location.source.host_organization | https://openalex.org/P4310320058 |
| primary_location.source.host_organization_name | Association for the Advancement of Artificial Intelligence |
| primary_location.source.host_organization_lineage | https://openalex.org/P4310320058 |
| primary_location.source.host_organization_lineage_names | Association for the Advancement of Artificial Intelligence |
| primary_location.license | |
| primary_location.pdf_url | https://ojs.aaai.org/index.php/AAAI/article/download/29607/31026 |
| primary_location.version | publishedVersion |
| primary_location.raw_type | journal-article |
| primary_location.license_id | |
| primary_location.is_accepted | True |
| primary_location.is_published | True |
| primary_location.raw_source_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| primary_location.landing_page_url | https://doi.org/10.1609/aaai.v38i15.29607 |
| publication_date | 2024-03-24 |
| publication_year | 2024 |
| referenced_works | https://openalex.org/W2119738618, https://openalex.org/W3203827806, https://openalex.org/W2142971854, https://openalex.org/W4221158443, https://openalex.org/W2072931156, https://openalex.org/W2996449210, https://openalex.org/W3172360140, https://openalex.org/W2787938642, https://openalex.org/W2904453761, https://openalex.org/W2904730732, https://openalex.org/W4306295204, https://openalex.org/W2781726626, https://openalex.org/W2950624398, https://openalex.org/W2959895084, https://openalex.org/W3025606523, https://openalex.org/W3205794883, https://openalex.org/W2947150733, https://openalex.org/W3033324992, https://openalex.org/W3174680430, https://openalex.org/W4292719637, https://openalex.org/W4286908116, https://openalex.org/W2859967432, https://openalex.org/W4283076713, https://openalex.org/W2986616324, https://openalex.org/W6679407231, https://openalex.org/W3092490845, https://openalex.org/W3115367651, https://openalex.org/W4225110331, https://openalex.org/W1491843047, https://openalex.org/W6683153233, https://openalex.org/W3202125656, https://openalex.org/W3162450516, https://openalex.org/W3130177876, https://openalex.org/W6777656069, https://openalex.org/W3016525976, https://openalex.org/W3090369311, https://openalex.org/W3177145475, https://openalex.org/W4306679387, https://openalex.org/W2158782408, https://openalex.org/W3046395471, https://openalex.org/W3201700917, https://openalex.org/W3208796191, https://openalex.org/W3170059879, https://openalex.org/W2963049774, https://openalex.org/W2799352588, https://openalex.org/W2130304665, https://openalex.org/W3038822267, https://openalex.org/W3028766998, https://openalex.org/W3166645952, https://openalex.org/W4288319859, https://openalex.org/W3022566517 |
| referenced_works_count | 51 |
| abstract_inverted_index.a | 10, 33, 96, 176 |
| abstract_inverted_index.In | 66 |
| abstract_inverted_index.RL | 102 |
| abstract_inverted_index.We | 75 |
| abstract_inverted_index.an | 71, 117 |
| abstract_inverted_index.be | 160 |
| abstract_inverted_index.by | 84, 90, 175 |
| abstract_inverted_index.in | 47, 121, 145, 162, 192 |
| abstract_inverted_index.of | 54, 63, 81, 154, 178 |
| abstract_inverted_index.on | 24, 183 |
| abstract_inverted_index.to | 124 |
| abstract_inverted_index.we | 68, 93, 115, 131, 149 |
| abstract_inverted_index.MDP | 73 |
| abstract_inverted_index.OOD | 87, 127 |
| abstract_inverted_index.and | 140 |
| abstract_inverted_index.can | 58, 159 |
| abstract_inverted_index.for | 13, 27, 109 |
| abstract_inverted_index.has | 5 |
| abstract_inverted_index.our | 169 |
| abstract_inverted_index.the | 40, 43, 52, 60, 78, 122, 133, 142, 146, 152, 184 |
| abstract_inverted_index.via | 31 |
| abstract_inverted_index.yet | 98 |
| abstract_inverted_index.(RL) | 4 |
| abstract_inverted_index.30%, | 179 |
| abstract_inverted_index.ORPO | 104, 158, 188 |
| abstract_inverted_index.Then | 130 |
| abstract_inverted_index.from | 45 |
| abstract_inverted_index.made | 6 |
| abstract_inverted_index.more | 86, 126 |
| abstract_inverted_index.show | 167 |
| abstract_inverted_index.that | 151, 168, 194 |
| abstract_inverted_index.this | 91 |
| abstract_inverted_index.with | 16, 137, 157 |
| abstract_inverted_index.(OOD) | 49 |
| abstract_inverted_index.MDPs. | 164 |
| abstract_inverted_index.O-MDP | 123 |
| abstract_inverted_index.ORPO, | 95 |
| abstract_inverted_index.P-MDP | 41, 173 |
| abstract_inverted_index.focus | 23 |
| abstract_inverted_index.model | 18, 107, 128 |
| abstract_inverted_index.pairs | 136 |
| abstract_inverted_index.train | 116 |
| abstract_inverted_index.which | 57 |
| abstract_inverted_index.works | 21 |
| abstract_inverted_index.Markov | 35 |
| abstract_inverted_index.P-MDP. | 147 |
| abstract_inverted_index.avenue | 12 |
| abstract_inverted_index.beyond | 51 |
| abstract_inverted_index.linear | 163 |
| abstract_inverted_index.margin | 177 |
| abstract_inverted_index.output | 143 |
| abstract_inverted_index.policy | 28, 112, 120, 144 |
| abstract_inverted_index.sample | 125 |
| abstract_inverted_index.simple | 97 |
| abstract_inverted_index.Process | 37 |
| abstract_inverted_index.ability | 62 |
| abstract_inverted_index.brought | 83 |
| abstract_inverted_index.models. | 65 |
| abstract_inverted_index.notable | 190 |
| abstract_inverted_index.offline | 1, 55, 101, 111 |
| abstract_inverted_index.present | 94 |
| abstract_inverted_index.propose | 69 |
| abstract_inverted_index.regions | 50 |
| abstract_inverted_index.relabel | 132 |
| abstract_inverted_index.require | 195 |
| abstract_inverted_index.results | 166 |
| abstract_inverted_index.rollout | 119 |
| abstract_inverted_index.sampled | 134 |
| abstract_inverted_index.support | 53 |
| abstract_inverted_index.trained | 156 |
| abstract_inverted_index.usually | 30 |
| abstract_inverted_index.(O-MDP). | 74 |
| abstract_inverted_index.(P-MDP). | 38 |
| abstract_inverted_index.Decision | 36 |
| abstract_inverted_index.Existing | 20 |
| abstract_inverted_index.However, | 39 |
| abstract_inverted_index.Rollouts | 108 |
| abstract_inverted_index.benefits | 80 |
| abstract_inverted_index.dynamics | 64 |
| abstract_inverted_index.exhibits | 189 |
| abstract_inverted_index.learning | 3, 46 |
| abstract_inverted_index.observed | 77 |
| abstract_inverted_index.offering | 9 |
| abstract_inverted_index.optimism | 82 |
| abstract_inverted_index.optimize | 141 |
| abstract_inverted_index.policies | 44, 155 |
| abstract_inverted_index.problems | 193 |
| abstract_inverted_index.rewards, | 139 |
| abstract_inverted_index.Moreover, | 187 |
| abstract_inverted_index.Motivated | 89 |
| abstract_inverted_index.achieving | 180 |
| abstract_inverted_index.baselines | 174 |
| abstract_inverted_index.contrast, | 67 |
| abstract_inverted_index.datasets, | 56 |
| abstract_inverted_index.effective | 99 |
| abstract_inverted_index.framework | 170 |
| abstract_inverted_index.generates | 105 |
| abstract_inverted_index.improving | 14 |
| abstract_inverted_index.initially | 76 |
| abstract_inverted_index.penalized | 138 |
| abstract_inverted_index.pessimism | 26 |
| abstract_inverted_index.potential | 79 |
| abstract_inverted_index.primarily | 22 |
| abstract_inverted_index.progress, | 8 |
| abstract_inverted_index.promising | 11 |
| abstract_inverted_index.rollouts. | 19, 88, 129 |
| abstract_inverted_index.synthetic | 17 |
| abstract_inverted_index.Optimistic | 72, 106 |
| abstract_inverted_index.advantages | 191 |
| abstract_inverted_index.benchmark. | 186 |
| abstract_inverted_index.framework. | 103 |
| abstract_inverted_index.optimistic | 118 |
| abstract_inverted_index.remarkable | 7 |
| abstract_inverted_index.Model-based | 0 |
| abstract_inverted_index.Pessimistic | 34, 110 |
| abstract_inverted_index.demonstrate | 150 |
| abstract_inverted_index.discourages | 42 |
| abstract_inverted_index.encouraging | 85 |
| abstract_inverted_index.model-based | 100 |
| abstract_inverted_index.outperforms | 172 |
| abstract_inverted_index.performance | 153, 182 |
| abstract_inverted_index.widely-used | 185 |
| abstract_inverted_index.Experimental | 165 |
| abstract_inverted_index.constructing | 32, 70 |
| abstract_inverted_index.observation, | 92 |
| abstract_inverted_index.state-action | 135 |
| abstract_inverted_index.Optimization. | 113 |
| abstract_inverted_index.Specifically, | 114 |
| abstract_inverted_index.incorporating | 25 |
| abstract_inverted_index.lower-bounded | 161 |
| abstract_inverted_index.optimization, | 29 |
| abstract_inverted_index.reinforcement | 2 |
| abstract_inverted_index.significantly | 171 |
| abstract_inverted_index.under-utilize | 59 |
| abstract_inverted_index.Theoretically, | 148 |
| abstract_inverted_index.generalization | 15, 61 |
| abstract_inverted_index.generalization. | 196 |
| abstract_inverted_index.state-of-the-art | 181 |
| abstract_inverted_index.out-of-distribution | 48 |
| cited_by_percentile_year.max | 94 |
| cited_by_percentile_year.min | 90 |
| countries_distinct_count | 1 |
| institutions_distinct_count | 8 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/16 |
| sustainable_development_goals[0].score | 0.75 |
| sustainable_development_goals[0].display_name | Peace, Justice and strong institutions |
| citation_normalized_percentile.value | 0.59662577 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |