Fast Policy Learning through Imitation and Reinforcement Article Swipe
YOU?
·
· 2018
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.1805.10413
Imitation learning (IL) consists of a set of tools that leverage expert demonstrations to quickly learn policies. However, if the expert is suboptimal, IL can yield policies with inferior performance compared to reinforcement learning (RL). In this paper, we aim to provide an algorithm that combines the best aspects of RL and IL. We accomplish this by formulating several popular RL and IL algorithms in a common mirror descent framework, showing that these algorithms can be viewed as a variation on a single approach. We then propose LOKI, a strategy for policy learning that first performs a small but random number of IL iterations before switching to a policy gradient RL method. We show that if the switching time is properly randomized, LOKI can learn to outperform a suboptimal expert and converge faster than running policy gradient from scratch. Finally, we evaluate the performance of LOKI experimentally in several simulated environments.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/1805.10413
- https://arxiv.org/pdf/1805.10413
- OA Status
- green
- Cited By
- 44
- References
- 31
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W2804930149
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W2804930149Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.1805.10413Digital Object Identifier
- Title
-
Fast Policy Learning through Imitation and ReinforcementWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2018Year of publication
- Publication date
-
2018-05-26Full publication date if available
- Authors
-
Ching-An Cheng, Xinyan Yan, Nolan Wagener, Byron BootsList of authors in order
- Landing page
-
https://arxiv.org/abs/1805.10413Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/1805.10413Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/1805.10413Direct OA link when available
- Concepts
-
Reinforcement learning, Leverage (statistics), Computer science, Policy learning, Artificial intelligence, Imitation, Set (abstract data type), Scratch, Machine learning, Psychology, Programming language, Social psychology, Operating systemTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
44Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 2, 2024: 1, 2023: 2, 2022: 6, 2021: 10Per-year citation counts (last 5 years)
- References (count)
-
31Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W2804930149 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.1805.10413 |
| ids.doi | https://doi.org/10.48550/arxiv.1805.10413 |
| ids.mag | 2804930149 |
| ids.openalex | https://openalex.org/W2804930149 |
| fwci | |
| type | preprint |
| title | Fast Policy Learning through Imitation and Reinforcement |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10462 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9998999834060669 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Reinforcement Learning in Robotics |
| topics[1].id | https://openalex.org/T10653 |
| topics[1].field.id | https://openalex.org/fields/22 |
| topics[1].field.display_name | Engineering |
| topics[1].score | 0.9972000122070312 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/2207 |
| topics[1].subfield.display_name | Control and Systems Engineering |
| topics[1].display_name | Robot Manipulation and Learning |
| topics[2].id | https://openalex.org/T11307 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9843000173568726 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Domain Adaptation and Few-Shot Learning |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C97541855 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7929812073707581 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q830687 |
| concepts[0].display_name | Reinforcement learning |
| concepts[1].id | https://openalex.org/C153083717 |
| concepts[1].level | 2 |
| concepts[1].score | 0.7876105308532715 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q6535263 |
| concepts[1].display_name | Leverage (statistics) |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.6944265365600586 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C2779436431 |
| concepts[3].level | 2 |
| concepts[3].score | 0.6304916143417358 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q30672407 |
| concepts[3].display_name | Policy learning |
| concepts[4].id | https://openalex.org/C154945302 |
| concepts[4].level | 1 |
| concepts[4].score | 0.5742306113243103 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[4].display_name | Artificial intelligence |
| concepts[5].id | https://openalex.org/C126388530 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5708860158920288 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q1131737 |
| concepts[5].display_name | Imitation |
| concepts[6].id | https://openalex.org/C177264268 |
| concepts[6].level | 2 |
| concepts[6].score | 0.5605413913726807 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q1514741 |
| concepts[6].display_name | Set (abstract data type) |
| concepts[7].id | https://openalex.org/C2781235140 |
| concepts[7].level | 2 |
| concepts[7].score | 0.5166714191436768 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q275131 |
| concepts[7].display_name | Scratch |
| concepts[8].id | https://openalex.org/C119857082 |
| concepts[8].level | 1 |
| concepts[8].score | 0.4208712875843048 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[8].display_name | Machine learning |
| concepts[9].id | https://openalex.org/C15744967 |
| concepts[9].level | 0 |
| concepts[9].score | 0.068671315908432 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q9418 |
| concepts[9].display_name | Psychology |
| concepts[10].id | https://openalex.org/C199360897 |
| concepts[10].level | 1 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[10].display_name | Programming language |
| concepts[11].id | https://openalex.org/C77805123 |
| concepts[11].level | 1 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q161272 |
| concepts[11].display_name | Social psychology |
| concepts[12].id | https://openalex.org/C111919701 |
| concepts[12].level | 1 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q9135 |
| concepts[12].display_name | Operating system |
| keywords[0].id | https://openalex.org/keywords/reinforcement-learning |
| keywords[0].score | 0.7929812073707581 |
| keywords[0].display_name | Reinforcement learning |
| keywords[1].id | https://openalex.org/keywords/leverage |
| keywords[1].score | 0.7876105308532715 |
| keywords[1].display_name | Leverage (statistics) |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.6944265365600586 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/policy-learning |
| keywords[3].score | 0.6304916143417358 |
| keywords[3].display_name | Policy learning |
| keywords[4].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[4].score | 0.5742306113243103 |
| keywords[4].display_name | Artificial intelligence |
| keywords[5].id | https://openalex.org/keywords/imitation |
| keywords[5].score | 0.5708860158920288 |
| keywords[5].display_name | Imitation |
| keywords[6].id | https://openalex.org/keywords/set |
| keywords[6].score | 0.5605413913726807 |
| keywords[6].display_name | Set (abstract data type) |
| keywords[7].id | https://openalex.org/keywords/scratch |
| keywords[7].score | 0.5166714191436768 |
| keywords[7].display_name | Scratch |
| keywords[8].id | https://openalex.org/keywords/machine-learning |
| keywords[8].score | 0.4208712875843048 |
| keywords[8].display_name | Machine learning |
| keywords[9].id | https://openalex.org/keywords/psychology |
| keywords[9].score | 0.068671315908432 |
| keywords[9].display_name | Psychology |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:1805.10413 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/1805.10413 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/1805.10413 |
| locations[1].id | doi:10.48550/arxiv.1805.10413 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.1805.10413 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5101825910 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-0610-2070 |
| authorships[0].author.display_name | Ching-An Cheng |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Ching-An Cheng |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5078753151 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-4082-084X |
| authorships[1].author.display_name | Xinyan Yan |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Xinyan Yan |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5010914575 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Nolan Wagener |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Nolan Wagener |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5110797782 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Byron Boots |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Byron Boots |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/1805.10413 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Fast Policy Learning through Imitation and Reinforcement |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10462 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9998999834060669 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Reinforcement Learning in Robotics |
| related_works | https://openalex.org/W4387497383, https://openalex.org/W3183948672, https://openalex.org/W3173606202, https://openalex.org/W3110381201, https://openalex.org/W2948807893, https://openalex.org/W2935909890, https://openalex.org/W2778153218, https://openalex.org/W2758277628, https://openalex.org/W1531601525, https://openalex.org/W2768698792 |
| cited_by_count | 44 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 2 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 1 |
| counts_by_year[2].year | 2023 |
| counts_by_year[2].cited_by_count | 2 |
| counts_by_year[3].year | 2022 |
| counts_by_year[3].cited_by_count | 6 |
| counts_by_year[4].year | 2021 |
| counts_by_year[4].cited_by_count | 10 |
| counts_by_year[5].year | 2020 |
| counts_by_year[5].cited_by_count | 8 |
| counts_by_year[6].year | 2019 |
| counts_by_year[6].cited_by_count | 10 |
| counts_by_year[7].year | 2018 |
| counts_by_year[7].cited_by_count | 4 |
| counts_by_year[8].year | 2017 |
| counts_by_year[8].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:1805.10413 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/1805.10413 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/1805.10413 |
| primary_location.id | pmh:oai:arXiv.org:1805.10413 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/1805.10413 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/1805.10413 |
| publication_date | 2018-05-26 |
| publication_year | 2018 |
| referenced_works | https://openalex.org/W1575592356, https://openalex.org/W1992208280, https://openalex.org/W2156737235, https://openalex.org/W2155027007, https://openalex.org/W2167224731, https://openalex.org/W2963411833, https://openalex.org/W2166302491, https://openalex.org/W2963099939, https://openalex.org/W112666333, https://openalex.org/W2949608212, https://openalex.org/W2165150801, https://openalex.org/W2130801532, https://openalex.org/W2784825028, https://openalex.org/W2602076750, https://openalex.org/W2296360731, https://openalex.org/W2032916024, https://openalex.org/W1499669280, https://openalex.org/W2950735232, https://openalex.org/W2952840881, https://openalex.org/W1757796397, https://openalex.org/W3037207827, https://openalex.org/W2767002724, https://openalex.org/W2108682071, https://openalex.org/W2964043796, https://openalex.org/W2793955514, https://openalex.org/W2772709170, https://openalex.org/W2962957031, https://openalex.org/W2033468335, https://openalex.org/W1191599655, https://openalex.org/W2757631751, https://openalex.org/W3096194929 |
| referenced_works_count | 31 |
| abstract_inverted_index.a | 5, 65, 78, 81, 88, 96, 107, 127 |
| abstract_inverted_index.IL | 23, 62, 102 |
| abstract_inverted_index.In | 35 |
| abstract_inverted_index.RL | 50, 60, 110 |
| abstract_inverted_index.We | 53, 84, 112 |
| abstract_inverted_index.an | 42 |
| abstract_inverted_index.as | 77 |
| abstract_inverted_index.be | 75 |
| abstract_inverted_index.by | 56 |
| abstract_inverted_index.if | 18, 115 |
| abstract_inverted_index.in | 64, 147 |
| abstract_inverted_index.is | 21, 119 |
| abstract_inverted_index.of | 4, 7, 49, 101, 144 |
| abstract_inverted_index.on | 80 |
| abstract_inverted_index.to | 13, 31, 40, 106, 125 |
| abstract_inverted_index.we | 38, 140 |
| abstract_inverted_index.IL. | 52 |
| abstract_inverted_index.aim | 39 |
| abstract_inverted_index.and | 51, 61, 130 |
| abstract_inverted_index.but | 98 |
| abstract_inverted_index.can | 24, 74, 123 |
| abstract_inverted_index.for | 90 |
| abstract_inverted_index.set | 6 |
| abstract_inverted_index.the | 19, 46, 116, 142 |
| abstract_inverted_index.(IL) | 2 |
| abstract_inverted_index.LOKI | 122, 145 |
| abstract_inverted_index.best | 47 |
| abstract_inverted_index.from | 137 |
| abstract_inverted_index.show | 113 |
| abstract_inverted_index.than | 133 |
| abstract_inverted_index.that | 9, 44, 71, 93, 114 |
| abstract_inverted_index.then | 85 |
| abstract_inverted_index.this | 36, 55 |
| abstract_inverted_index.time | 118 |
| abstract_inverted_index.with | 27 |
| abstract_inverted_index.(RL). | 34 |
| abstract_inverted_index.LOKI, | 87 |
| abstract_inverted_index.first | 94 |
| abstract_inverted_index.learn | 15, 124 |
| abstract_inverted_index.small | 97 |
| abstract_inverted_index.these | 72 |
| abstract_inverted_index.tools | 8 |
| abstract_inverted_index.yield | 25 |
| abstract_inverted_index.before | 104 |
| abstract_inverted_index.common | 66 |
| abstract_inverted_index.expert | 11, 20, 129 |
| abstract_inverted_index.faster | 132 |
| abstract_inverted_index.mirror | 67 |
| abstract_inverted_index.number | 100 |
| abstract_inverted_index.paper, | 37 |
| abstract_inverted_index.policy | 91, 108, 135 |
| abstract_inverted_index.random | 99 |
| abstract_inverted_index.single | 82 |
| abstract_inverted_index.viewed | 76 |
| abstract_inverted_index.aspects | 48 |
| abstract_inverted_index.descent | 68 |
| abstract_inverted_index.method. | 111 |
| abstract_inverted_index.popular | 59 |
| abstract_inverted_index.propose | 86 |
| abstract_inverted_index.provide | 41 |
| abstract_inverted_index.quickly | 14 |
| abstract_inverted_index.running | 134 |
| abstract_inverted_index.several | 58, 148 |
| abstract_inverted_index.showing | 70 |
| abstract_inverted_index.Finally, | 139 |
| abstract_inverted_index.However, | 17 |
| abstract_inverted_index.combines | 45 |
| abstract_inverted_index.compared | 30 |
| abstract_inverted_index.consists | 3 |
| abstract_inverted_index.converge | 131 |
| abstract_inverted_index.evaluate | 141 |
| abstract_inverted_index.gradient | 109, 136 |
| abstract_inverted_index.inferior | 28 |
| abstract_inverted_index.learning | 1, 33, 92 |
| abstract_inverted_index.leverage | 10 |
| abstract_inverted_index.performs | 95 |
| abstract_inverted_index.policies | 26 |
| abstract_inverted_index.properly | 120 |
| abstract_inverted_index.scratch. | 138 |
| abstract_inverted_index.strategy | 89 |
| abstract_inverted_index.Imitation | 0 |
| abstract_inverted_index.algorithm | 43 |
| abstract_inverted_index.approach. | 83 |
| abstract_inverted_index.policies. | 16 |
| abstract_inverted_index.simulated | 149 |
| abstract_inverted_index.switching | 105, 117 |
| abstract_inverted_index.variation | 79 |
| abstract_inverted_index.accomplish | 54 |
| abstract_inverted_index.algorithms | 63, 73 |
| abstract_inverted_index.framework, | 69 |
| abstract_inverted_index.iterations | 103 |
| abstract_inverted_index.outperform | 126 |
| abstract_inverted_index.suboptimal | 128 |
| abstract_inverted_index.formulating | 57 |
| abstract_inverted_index.performance | 29, 143 |
| abstract_inverted_index.randomized, | 121 |
| abstract_inverted_index.suboptimal, | 22 |
| abstract_inverted_index.environments. | 150 |
| abstract_inverted_index.reinforcement | 32 |
| abstract_inverted_index.demonstrations | 12 |
| abstract_inverted_index.experimentally | 146 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/16 |
| sustainable_development_goals[0].score | 0.46000000834465027 |
| sustainable_development_goals[0].display_name | Peace, Justice and strong institutions |
| citation_normalized_percentile |