Ranking Policy Gradient Article Swipe
YOU?
·
· 2019
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.1906.09674
Sample inefficiency is a long-lasting problem in reinforcement learning (RL). The state-of-the-art estimates the optimal action values while it usually involves an extensive search over the state-action space and unstable optimization. Towards the sample-efficient RL, we propose ranking policy gradient (RPG), a policy gradient method that learns the optimal rank of a set of discrete actions. To accelerate the learning of policy gradient methods, we establish the equivalence between maximizing the lower bound of return and imitating a near-optimal policy without accessing any oracles. These results lead to a general off-policy learning framework, which preserves the optimality, reduces variance, and improves the sample-efficiency. Furthermore, the sample complexity of RPG does not depend on the dimension of state space, which enables RPG for large-scale problems. We conduct extensive experiments showing that when consolidating with the off-policy learning framework, RPG substantially reduces the sample complexity, comparing to the state-of-the-art.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/1906.09674
- https://arxiv.org/pdf/1906.09674
- OA Status
- green
- References
- 58
- Related Works
- 20
- OpenAlex ID
- https://openalex.org/W2951962403
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W2951962403Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.1906.09674Digital Object Identifier
- Title
-
Ranking Policy GradientWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2019Year of publication
- Publication date
-
2019-06-24Full publication date if available
- Authors
-
Kaixiang Lin, Jiayu ZhouList of authors in order
- Landing page
-
https://arxiv.org/abs/1906.09674Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/1906.09674Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/1906.09674Direct OA link when available
- Concepts
-
Reinforcement learning, Ranking (information retrieval), Computer science, Rank (graph theory), Sample (material), Inefficiency, Variance (accounting), Dimension (graph theory), Equivalence (formal languages), Set (abstract data type), Learning to rank, Mathematical optimization, Artificial intelligence, Mathematics, Machine learning, Economics, Programming language, Accounting, Chromatography, Microeconomics, Combinatorics, Chemistry, Discrete mathematics, Pure mathematicsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- References (count)
-
58Number of works referenced by this work
- Related works (count)
-
20Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W2951962403 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.1906.09674 |
| ids.doi | https://doi.org/10.48550/arxiv.1906.09674 |
| ids.mag | 2951962403 |
| ids.openalex | https://openalex.org/W2951962403 |
| fwci | |
| type | preprint |
| title | Ranking Policy Gradient |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10462 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9998999834060669 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Reinforcement Learning in Robotics |
| topics[1].id | https://openalex.org/T11689 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9977999925613403 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Adversarial Robustness in Machine Learning |
| topics[2].id | https://openalex.org/T12072 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9973999857902527 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Machine Learning and Algorithms |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C97541855 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7731338739395142 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q830687 |
| concepts[0].display_name | Reinforcement learning |
| concepts[1].id | https://openalex.org/C189430467 |
| concepts[1].level | 2 |
| concepts[1].score | 0.6784939765930176 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q7293293 |
| concepts[1].display_name | Ranking (information retrieval) |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.5930432081222534 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C164226766 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5414173603057861 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q7293202 |
| concepts[3].display_name | Rank (graph theory) |
| concepts[4].id | https://openalex.org/C198531522 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5253980755805969 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q485146 |
| concepts[4].display_name | Sample (material) |
| concepts[5].id | https://openalex.org/C2778869765 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5156247019767761 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q6028363 |
| concepts[5].display_name | Inefficiency |
| concepts[6].id | https://openalex.org/C196083921 |
| concepts[6].level | 2 |
| concepts[6].score | 0.45183053612709045 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q7915758 |
| concepts[6].display_name | Variance (accounting) |
| concepts[7].id | https://openalex.org/C33676613 |
| concepts[7].level | 2 |
| concepts[7].score | 0.4400838315486908 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q13415176 |
| concepts[7].display_name | Dimension (graph theory) |
| concepts[8].id | https://openalex.org/C2780069185 |
| concepts[8].level | 2 |
| concepts[8].score | 0.43866074085235596 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q7977945 |
| concepts[8].display_name | Equivalence (formal languages) |
| concepts[9].id | https://openalex.org/C177264268 |
| concepts[9].level | 2 |
| concepts[9].score | 0.43698030710220337 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q1514741 |
| concepts[9].display_name | Set (abstract data type) |
| concepts[10].id | https://openalex.org/C86037889 |
| concepts[10].level | 3 |
| concepts[10].score | 0.4201257824897766 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q4330127 |
| concepts[10].display_name | Learning to rank |
| concepts[11].id | https://openalex.org/C126255220 |
| concepts[11].level | 1 |
| concepts[11].score | 0.4144032597541809 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q141495 |
| concepts[11].display_name | Mathematical optimization |
| concepts[12].id | https://openalex.org/C154945302 |
| concepts[12].level | 1 |
| concepts[12].score | 0.39114588499069214 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[12].display_name | Artificial intelligence |
| concepts[13].id | https://openalex.org/C33923547 |
| concepts[13].level | 0 |
| concepts[13].score | 0.360595703125 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[13].display_name | Mathematics |
| concepts[14].id | https://openalex.org/C119857082 |
| concepts[14].level | 1 |
| concepts[14].score | 0.3423147201538086 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[14].display_name | Machine learning |
| concepts[15].id | https://openalex.org/C162324750 |
| concepts[15].level | 0 |
| concepts[15].score | 0.08634519577026367 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q8134 |
| concepts[15].display_name | Economics |
| concepts[16].id | https://openalex.org/C199360897 |
| concepts[16].level | 1 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[16].display_name | Programming language |
| concepts[17].id | https://openalex.org/C121955636 |
| concepts[17].level | 1 |
| concepts[17].score | 0.0 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q4116214 |
| concepts[17].display_name | Accounting |
| concepts[18].id | https://openalex.org/C43617362 |
| concepts[18].level | 1 |
| concepts[18].score | 0.0 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q170050 |
| concepts[18].display_name | Chromatography |
| concepts[19].id | https://openalex.org/C175444787 |
| concepts[19].level | 1 |
| concepts[19].score | 0.0 |
| concepts[19].wikidata | https://www.wikidata.org/wiki/Q39072 |
| concepts[19].display_name | Microeconomics |
| concepts[20].id | https://openalex.org/C114614502 |
| concepts[20].level | 1 |
| concepts[20].score | 0.0 |
| concepts[20].wikidata | https://www.wikidata.org/wiki/Q76592 |
| concepts[20].display_name | Combinatorics |
| concepts[21].id | https://openalex.org/C185592680 |
| concepts[21].level | 0 |
| concepts[21].score | 0.0 |
| concepts[21].wikidata | https://www.wikidata.org/wiki/Q2329 |
| concepts[21].display_name | Chemistry |
| concepts[22].id | https://openalex.org/C118615104 |
| concepts[22].level | 1 |
| concepts[22].score | 0.0 |
| concepts[22].wikidata | https://www.wikidata.org/wiki/Q121416 |
| concepts[22].display_name | Discrete mathematics |
| concepts[23].id | https://openalex.org/C202444582 |
| concepts[23].level | 1 |
| concepts[23].score | 0.0 |
| concepts[23].wikidata | https://www.wikidata.org/wiki/Q837863 |
| concepts[23].display_name | Pure mathematics |
| keywords[0].id | https://openalex.org/keywords/reinforcement-learning |
| keywords[0].score | 0.7731338739395142 |
| keywords[0].display_name | Reinforcement learning |
| keywords[1].id | https://openalex.org/keywords/ranking |
| keywords[1].score | 0.6784939765930176 |
| keywords[1].display_name | Ranking (information retrieval) |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.5930432081222534 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/rank |
| keywords[3].score | 0.5414173603057861 |
| keywords[3].display_name | Rank (graph theory) |
| keywords[4].id | https://openalex.org/keywords/sample |
| keywords[4].score | 0.5253980755805969 |
| keywords[4].display_name | Sample (material) |
| keywords[5].id | https://openalex.org/keywords/inefficiency |
| keywords[5].score | 0.5156247019767761 |
| keywords[5].display_name | Inefficiency |
| keywords[6].id | https://openalex.org/keywords/variance |
| keywords[6].score | 0.45183053612709045 |
| keywords[6].display_name | Variance (accounting) |
| keywords[7].id | https://openalex.org/keywords/dimension |
| keywords[7].score | 0.4400838315486908 |
| keywords[7].display_name | Dimension (graph theory) |
| keywords[8].id | https://openalex.org/keywords/equivalence |
| keywords[8].score | 0.43866074085235596 |
| keywords[8].display_name | Equivalence (formal languages) |
| keywords[9].id | https://openalex.org/keywords/set |
| keywords[9].score | 0.43698030710220337 |
| keywords[9].display_name | Set (abstract data type) |
| keywords[10].id | https://openalex.org/keywords/learning-to-rank |
| keywords[10].score | 0.4201257824897766 |
| keywords[10].display_name | Learning to rank |
| keywords[11].id | https://openalex.org/keywords/mathematical-optimization |
| keywords[11].score | 0.4144032597541809 |
| keywords[11].display_name | Mathematical optimization |
| keywords[12].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[12].score | 0.39114588499069214 |
| keywords[12].display_name | Artificial intelligence |
| keywords[13].id | https://openalex.org/keywords/mathematics |
| keywords[13].score | 0.360595703125 |
| keywords[13].display_name | Mathematics |
| keywords[14].id | https://openalex.org/keywords/machine-learning |
| keywords[14].score | 0.3423147201538086 |
| keywords[14].display_name | Machine learning |
| keywords[15].id | https://openalex.org/keywords/economics |
| keywords[15].score | 0.08634519577026367 |
| keywords[15].display_name | Economics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:1906.09674 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/1906.09674 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/1906.09674 |
| locations[1].id | mag:2951962403 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | submittedVersion |
| locations[1].raw_type | |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | False |
| locations[1].raw_source_name | arXiv (Cornell University) |
| locations[1].landing_page_url | http://export.arxiv.org/pdf/1906.09674 |
| locations[2].id | doi:10.48550/arxiv.1906.09674 |
| locations[2].is_oa | True |
| locations[2].source.id | https://openalex.org/S4306400194 |
| locations[2].source.issn | |
| locations[2].source.type | repository |
| locations[2].source.is_oa | True |
| locations[2].source.issn_l | |
| locations[2].source.is_core | False |
| locations[2].source.is_in_doaj | False |
| locations[2].source.display_name | arXiv (Cornell University) |
| locations[2].source.host_organization | https://openalex.org/I205783295 |
| locations[2].source.host_organization_name | Cornell University |
| locations[2].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[2].license | |
| locations[2].pdf_url | |
| locations[2].version | |
| locations[2].raw_type | article |
| locations[2].license_id | |
| locations[2].is_accepted | False |
| locations[2].is_published | |
| locations[2].raw_source_name | |
| locations[2].landing_page_url | https://doi.org/10.48550/arxiv.1906.09674 |
| locations[3].id | mag:2996343999 |
| locations[3].is_oa | True |
| locations[3].source.id | https://openalex.org/S4306400194 |
| locations[3].source.issn | |
| locations[3].source.type | repository |
| locations[3].source.is_oa | True |
| locations[3].source.issn_l | |
| locations[3].source.is_core | False |
| locations[3].source.is_in_doaj | False |
| locations[3].source.display_name | arXiv (Cornell University) |
| locations[3].source.host_organization | https://openalex.org/I205783295 |
| locations[3].source.host_organization_name | Cornell University |
| locations[3].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[3].license | |
| locations[3].pdf_url | |
| locations[3].version | |
| locations[3].raw_type | |
| locations[3].license_id | |
| locations[3].is_accepted | False |
| locations[3].is_published | |
| locations[3].raw_source_name | arXiv (Cornell University) |
| locations[3].landing_page_url | https://arxiv.org/pdf/1906.09674.pdf |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5100443077 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-8626-8934 |
| authorships[0].author.display_name | Kaixiang Lin |
| authorships[0].countries | US |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I87216513 |
| authorships[0].affiliations[0].raw_affiliation_string | Michigan State University |
| authorships[0].institutions[0].id | https://openalex.org/I87216513 |
| authorships[0].institutions[0].ror | https://ror.org/05hs6h993 |
| authorships[0].institutions[0].type | education |
| authorships[0].institutions[0].lineage | https://openalex.org/I87216513 |
| authorships[0].institutions[0].country_code | US |
| authorships[0].institutions[0].display_name | Michigan State University |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Kaixiang Lin |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | Michigan State University |
| authorships[1].author.id | https://openalex.org/A5047215778 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-4336-6777 |
| authorships[1].author.display_name | Jiayu Zhou |
| authorships[1].countries | US |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I87216513 |
| authorships[1].affiliations[0].raw_affiliation_string | Michigan State University |
| authorships[1].institutions[0].id | https://openalex.org/I87216513 |
| authorships[1].institutions[0].ror | https://ror.org/05hs6h993 |
| authorships[1].institutions[0].type | education |
| authorships[1].institutions[0].lineage | https://openalex.org/I87216513 |
| authorships[1].institutions[0].country_code | US |
| authorships[1].institutions[0].display_name | Michigan State University |
| authorships[1].author_position | last |
| authorships[1].raw_author_name | Jiayu Zhou |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | Michigan State University |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/1906.09674 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Ranking Policy Gradient |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10462 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9998999834060669 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Reinforcement Learning in Robotics |
| related_works | https://openalex.org/W3049166411, https://openalex.org/W2950492145, https://openalex.org/W3034567339, https://openalex.org/W2993185773, https://openalex.org/W2106530845, https://openalex.org/W3036086835, https://openalex.org/W3127035336, https://openalex.org/W2914702425, https://openalex.org/W3080213971, https://openalex.org/W2951066886, https://openalex.org/W2591506851, https://openalex.org/W2944187456, https://openalex.org/W3034675169, https://openalex.org/W2765274790, https://openalex.org/W3133860714, https://openalex.org/W1518798593, https://openalex.org/W3036846812, https://openalex.org/W3016538023, https://openalex.org/W2963215512, https://openalex.org/W3114629890 |
| cited_by_count | 0 |
| locations_count | 4 |
| best_oa_location.id | pmh:oai:arXiv.org:1906.09674 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/1906.09674 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/1906.09674 |
| primary_location.id | pmh:oai:arXiv.org:1906.09674 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/1906.09674 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/1906.09674 |
| publication_date | 2019-06-24 |
| publication_year | 2019 |
| referenced_works | https://openalex.org/W1944672, https://openalex.org/W2761873684, https://openalex.org/W2129670787, https://openalex.org/W2019363670, https://openalex.org/W2119567691, https://openalex.org/W2785738552, https://openalex.org/W2579923771, https://openalex.org/W2963971282, https://openalex.org/W2953334758, https://openalex.org/W2593044849, https://openalex.org/W2125612430, https://openalex.org/W2766447205, https://openalex.org/W2978644431, https://openalex.org/W107583932, https://openalex.org/W2950492145, https://openalex.org/W2143331230, https://openalex.org/W2907502549, https://openalex.org/W2145339207, https://openalex.org/W2161521419, https://openalex.org/W2963376229, https://openalex.org/W2899806036, https://openalex.org/W2545659366, https://openalex.org/W1777239053, https://openalex.org/W2884559200, https://openalex.org/W2142641780, https://openalex.org/W2167117957, https://openalex.org/W2103235543, https://openalex.org/W2123447947, https://openalex.org/W1530699444, https://openalex.org/W2962957031, https://openalex.org/W2556958149, https://openalex.org/W2963423916, https://openalex.org/W2108862644, https://openalex.org/W2173564293, https://openalex.org/W2121863487, https://openalex.org/W2964043796, https://openalex.org/W2119717200, https://openalex.org/W2080039641, https://openalex.org/W2949996623, https://openalex.org/W2158349948, https://openalex.org/W2736601468, https://openalex.org/W2594640072, https://openalex.org/W3146803896, https://openalex.org/W2877093712, https://openalex.org/W2109169869, https://openalex.org/W2201581102, https://openalex.org/W112666333, https://openalex.org/W3137695714, https://openalex.org/W3103780890, https://openalex.org/W2963674921, https://openalex.org/W1663973292, https://openalex.org/W2963250930, https://openalex.org/W2155968351, https://openalex.org/W2554120691, https://openalex.org/W2962902376, https://openalex.org/W2886712433, https://openalex.org/W2905342215, https://openalex.org/W2803308811 |
| referenced_works_count | 58 |
| abstract_inverted_index.a | 3, 41, 51, 77, 88 |
| abstract_inverted_index.To | 56 |
| abstract_inverted_index.We | 124 |
| abstract_inverted_index.an | 21 |
| abstract_inverted_index.in | 6 |
| abstract_inverted_index.is | 2 |
| abstract_inverted_index.it | 18 |
| abstract_inverted_index.of | 50, 53, 60, 73, 107, 115 |
| abstract_inverted_index.on | 112 |
| abstract_inverted_index.to | 87, 144 |
| abstract_inverted_index.we | 35, 64 |
| abstract_inverted_index.RL, | 34 |
| abstract_inverted_index.RPG | 108, 120, 137 |
| abstract_inverted_index.The | 10 |
| abstract_inverted_index.and | 28, 75, 99 |
| abstract_inverted_index.any | 82 |
| abstract_inverted_index.for | 121 |
| abstract_inverted_index.not | 110 |
| abstract_inverted_index.set | 52 |
| abstract_inverted_index.the | 13, 25, 32, 47, 58, 66, 70, 95, 101, 104, 113, 133, 140, 145 |
| abstract_inverted_index.does | 109 |
| abstract_inverted_index.lead | 86 |
| abstract_inverted_index.over | 24 |
| abstract_inverted_index.rank | 49 |
| abstract_inverted_index.that | 45, 129 |
| abstract_inverted_index.when | 130 |
| abstract_inverted_index.with | 132 |
| abstract_inverted_index.(RL). | 9 |
| abstract_inverted_index.These | 84 |
| abstract_inverted_index.bound | 72 |
| abstract_inverted_index.lower | 71 |
| abstract_inverted_index.space | 27 |
| abstract_inverted_index.state | 116 |
| abstract_inverted_index.which | 93, 118 |
| abstract_inverted_index.while | 17 |
| abstract_inverted_index.(RPG), | 40 |
| abstract_inverted_index.Sample | 0 |
| abstract_inverted_index.action | 15 |
| abstract_inverted_index.depend | 111 |
| abstract_inverted_index.learns | 46 |
| abstract_inverted_index.method | 44 |
| abstract_inverted_index.policy | 38, 42, 61, 79 |
| abstract_inverted_index.return | 74 |
| abstract_inverted_index.sample | 105, 141 |
| abstract_inverted_index.search | 23 |
| abstract_inverted_index.space, | 117 |
| abstract_inverted_index.values | 16 |
| abstract_inverted_index.Towards | 31 |
| abstract_inverted_index.between | 68 |
| abstract_inverted_index.conduct | 125 |
| abstract_inverted_index.enables | 119 |
| abstract_inverted_index.general | 89 |
| abstract_inverted_index.optimal | 14, 48 |
| abstract_inverted_index.problem | 5 |
| abstract_inverted_index.propose | 36 |
| abstract_inverted_index.ranking | 37 |
| abstract_inverted_index.reduces | 97, 139 |
| abstract_inverted_index.results | 85 |
| abstract_inverted_index.showing | 128 |
| abstract_inverted_index.usually | 19 |
| abstract_inverted_index.without | 80 |
| abstract_inverted_index.actions. | 55 |
| abstract_inverted_index.discrete | 54 |
| abstract_inverted_index.gradient | 39, 43, 62 |
| abstract_inverted_index.improves | 100 |
| abstract_inverted_index.involves | 20 |
| abstract_inverted_index.learning | 8, 59, 91, 135 |
| abstract_inverted_index.methods, | 63 |
| abstract_inverted_index.oracles. | 83 |
| abstract_inverted_index.unstable | 29 |
| abstract_inverted_index.accessing | 81 |
| abstract_inverted_index.comparing | 143 |
| abstract_inverted_index.dimension | 114 |
| abstract_inverted_index.establish | 65 |
| abstract_inverted_index.estimates | 12 |
| abstract_inverted_index.extensive | 22, 126 |
| abstract_inverted_index.imitating | 76 |
| abstract_inverted_index.preserves | 94 |
| abstract_inverted_index.problems. | 123 |
| abstract_inverted_index.variance, | 98 |
| abstract_inverted_index.accelerate | 57 |
| abstract_inverted_index.complexity | 106 |
| abstract_inverted_index.framework, | 92, 136 |
| abstract_inverted_index.maximizing | 69 |
| abstract_inverted_index.off-policy | 90, 134 |
| abstract_inverted_index.complexity, | 142 |
| abstract_inverted_index.equivalence | 67 |
| abstract_inverted_index.experiments | 127 |
| abstract_inverted_index.large-scale | 122 |
| abstract_inverted_index.optimality, | 96 |
| abstract_inverted_index.Furthermore, | 103 |
| abstract_inverted_index.inefficiency | 1 |
| abstract_inverted_index.long-lasting | 4 |
| abstract_inverted_index.near-optimal | 78 |
| abstract_inverted_index.state-action | 26 |
| abstract_inverted_index.consolidating | 131 |
| abstract_inverted_index.optimization. | 30 |
| abstract_inverted_index.reinforcement | 7 |
| abstract_inverted_index.substantially | 138 |
| abstract_inverted_index.sample-efficient | 33 |
| abstract_inverted_index.state-of-the-art | 11 |
| abstract_inverted_index.state-of-the-art. | 146 |
| abstract_inverted_index.sample-efficiency. | 102 |
| cited_by_percentile_year | |
| countries_distinct_count | 1 |
| institutions_distinct_count | 2 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/16 |
| sustainable_development_goals[0].score | 0.7200000286102295 |
| sustainable_development_goals[0].display_name | Peace, Justice and strong institutions |
| citation_normalized_percentile |