Deep Q-learning From Demonstrations Article Swipe
YOU?
·
· 2018
· Open Access
·
· DOI: https://doi.org/10.1609/aaai.v32i1.11757
Deep reinforcement learning (RL) has achieved several high profile successes in difficult decision-making problems. However, these algorithms typically require a huge amount of data before they reach reasonable performance. In fact, their performance during learning can be extremely poor. This may be acceptable for a simulator, but it severely limits the applicability of deep RL to many real-world tasks, where the agent must learn in the real environment. In this paper we study a setting where the agent may access data from previous control of the system. We present an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstration data while learning thanks to a prioritized replay mechanism. DQfD works by combining temporal difference updates with supervised classification of the demonstrator’s actions. We show that DQfD has better initial performance than Prioritized Dueling Double Deep Q-Networks (PDD DQN) as it starts with better scores on the first million steps on 41 of 42 games and on average it takes PDD DQN 83 million steps to catch up to DQfD’s performance. DQfD learns to out-perform the best demonstration given in 14 of 42 games. In addition, DQfD leverages human demonstrations to achieve state-of-the-art results for 11 games. Finally, we show that DQfD performs better than three related algorithms for incorporating demonstration data into DQN.
Related Topics
- Type
- article
- Language
- en
- Landing Page
- https://doi.org/10.1609/aaai.v32i1.11757
- https://ojs.aaai.org/index.php/AAAI/article/download/11757/11616
- OA Status
- diamond
- Cited By
- 766
- References
- 66
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W2788862220
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W2788862220Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.1609/aaai.v32i1.11757Digital Object Identifier
- Title
-
Deep Q-learning From DemonstrationsWork title
- Type
-
articleOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2018Year of publication
- Publication date
-
2018-04-29Full publication date if available
- Authors
-
Todd Hester, Matej Vecerík, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Ian Osband, Gabriel Dulac-Arnold, John Agapiou, Joel Z. Leibo, Audrūnas GruslysList of authors in order
- Landing page
-
https://doi.org/10.1609/aaai.v32i1.11757Publisher landing page
- PDF URL
-
https://ojs.aaai.org/index.php/AAAI/article/download/11757/11616Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
diamondOpen access status per OpenAlex
- OA URL
-
https://ojs.aaai.org/index.php/AAAI/article/download/11757/11616Direct OA link when available
- Concepts
-
Computer science, Reinforcement learning, Process (computing), Artificial intelligence, Deep learning, Machine learning, Control (management), Operating systemTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
766Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 76, 2024: 107, 2023: 150, 2022: 126, 2021: 110Per-year citation counts (last 5 years)
- References (count)
-
66Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W2788862220 |
|---|---|
| doi | https://doi.org/10.1609/aaai.v32i1.11757 |
| ids.doi | https://doi.org/10.1609/aaai.v32i1.11757 |
| ids.mag | 2788862220 |
| ids.openalex | https://openalex.org/W2788862220 |
| fwci | 31.74285055 |
| type | article |
| title | Deep Q-learning From Demonstrations |
| biblio.issue | 1 |
| biblio.volume | 32 |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10462 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9997000098228455 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Reinforcement Learning in Robotics |
| topics[1].id | https://openalex.org/T11975 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9901999831199646 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Evolutionary Algorithms and Applications |
| topics[2].id | https://openalex.org/T11674 |
| topics[2].field.id | https://openalex.org/fields/20 |
| topics[2].field.display_name | Economics, Econometrics and Finance |
| topics[2].score | 0.9868999719619751 |
| topics[2].domain.id | https://openalex.org/domains/2 |
| topics[2].domain.display_name | Social Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/2002 |
| topics[2].subfield.display_name | Economics and Econometrics |
| topics[2].display_name | Sports Analytics and Performance |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.8663511276245117 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C97541855 |
| concepts[1].level | 2 |
| concepts[1].score | 0.8146408796310425 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q830687 |
| concepts[1].display_name | Reinforcement learning |
| concepts[2].id | https://openalex.org/C98045186 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6455647349357605 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q205663 |
| concepts[2].display_name | Process (computing) |
| concepts[3].id | https://openalex.org/C154945302 |
| concepts[3].level | 1 |
| concepts[3].score | 0.6373509168624878 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[3].display_name | Artificial intelligence |
| concepts[4].id | https://openalex.org/C108583219 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5887935161590576 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q197536 |
| concepts[4].display_name | Deep learning |
| concepts[5].id | https://openalex.org/C119857082 |
| concepts[5].level | 1 |
| concepts[5].score | 0.521086573600769 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[5].display_name | Machine learning |
| concepts[6].id | https://openalex.org/C2775924081 |
| concepts[6].level | 2 |
| concepts[6].score | 0.4358724355697632 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q55608371 |
| concepts[6].display_name | Control (management) |
| concepts[7].id | https://openalex.org/C111919701 |
| concepts[7].level | 1 |
| concepts[7].score | 0.0 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q9135 |
| concepts[7].display_name | Operating system |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.8663511276245117 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/reinforcement-learning |
| keywords[1].score | 0.8146408796310425 |
| keywords[1].display_name | Reinforcement learning |
| keywords[2].id | https://openalex.org/keywords/process |
| keywords[2].score | 0.6455647349357605 |
| keywords[2].display_name | Process (computing) |
| keywords[3].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[3].score | 0.6373509168624878 |
| keywords[3].display_name | Artificial intelligence |
| keywords[4].id | https://openalex.org/keywords/deep-learning |
| keywords[4].score | 0.5887935161590576 |
| keywords[4].display_name | Deep learning |
| keywords[5].id | https://openalex.org/keywords/machine-learning |
| keywords[5].score | 0.521086573600769 |
| keywords[5].display_name | Machine learning |
| keywords[6].id | https://openalex.org/keywords/control |
| keywords[6].score | 0.4358724355697632 |
| keywords[6].display_name | Control (management) |
| language | en |
| locations[0].id | doi:10.1609/aaai.v32i1.11757 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4210191458 |
| locations[0].source.issn | 2159-5399, 2374-3468 |
| locations[0].source.type | conference |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | 2159-5399 |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| locations[0].source.host_organization | https://openalex.org/P4310320058 |
| locations[0].source.host_organization_name | Association for the Advancement of Artificial Intelligence |
| locations[0].source.host_organization_lineage | https://openalex.org/P4310320058 |
| locations[0].source.host_organization_lineage_names | Association for the Advancement of Artificial Intelligence |
| locations[0].license | |
| locations[0].pdf_url | https://ojs.aaai.org/index.php/AAAI/article/download/11757/11616 |
| locations[0].version | publishedVersion |
| locations[0].raw_type | journal-article |
| locations[0].license_id | |
| locations[0].is_accepted | True |
| locations[0].is_published | True |
| locations[0].raw_source_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| locations[0].landing_page_url | https://doi.org/10.1609/aaai.v32i1.11757 |
| indexed_in | crossref |
| authorships[0].author.id | https://openalex.org/A5048229171 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Todd Hester |
| authorships[0].countries | GB, US |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I1291425158, https://openalex.org/I4210090411 |
| authorships[0].affiliations[0].raw_affiliation_string | Google DeepMind |
| authorships[0].institutions[0].id | https://openalex.org/I4210090411 |
| authorships[0].institutions[0].ror | https://ror.org/00971b260 |
| authorships[0].institutions[0].type | company |
| authorships[0].institutions[0].lineage | https://openalex.org/I4210090411, https://openalex.org/I4210128969 |
| authorships[0].institutions[0].country_code | GB |
| authorships[0].institutions[0].display_name | DeepMind (United Kingdom) |
| authorships[0].institutions[1].id | https://openalex.org/I1291425158 |
| authorships[0].institutions[1].ror | https://ror.org/00njsd438 |
| authorships[0].institutions[1].type | company |
| authorships[0].institutions[1].lineage | https://openalex.org/I1291425158, https://openalex.org/I4210128969 |
| authorships[0].institutions[1].country_code | US |
| authorships[0].institutions[1].display_name | Google (United States) |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Todd Hester |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | Google DeepMind |
| authorships[1].author.id | https://openalex.org/A5039155450 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Matej Vecerík |
| authorships[1].countries | GB, US |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I1291425158, https://openalex.org/I4210090411 |
| authorships[1].affiliations[0].raw_affiliation_string | Google DeepMind |
| authorships[1].institutions[0].id | https://openalex.org/I4210090411 |
| authorships[1].institutions[0].ror | https://ror.org/00971b260 |
| authorships[1].institutions[0].type | company |
| authorships[1].institutions[0].lineage | https://openalex.org/I4210090411, https://openalex.org/I4210128969 |
| authorships[1].institutions[0].country_code | GB |
| authorships[1].institutions[0].display_name | DeepMind (United Kingdom) |
| authorships[1].institutions[1].id | https://openalex.org/I1291425158 |
| authorships[1].institutions[1].ror | https://ror.org/00njsd438 |
| authorships[1].institutions[1].type | company |
| authorships[1].institutions[1].lineage | https://openalex.org/I1291425158, https://openalex.org/I4210128969 |
| authorships[1].institutions[1].country_code | US |
| authorships[1].institutions[1].display_name | Google (United States) |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Matej Vecerik |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | Google DeepMind |
| authorships[2].author.id | https://openalex.org/A5065100569 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-5386-465X |
| authorships[2].author.display_name | Olivier Pietquin |
| authorships[2].countries | GB, US |
| authorships[2].affiliations[0].institution_ids | https://openalex.org/I1291425158, https://openalex.org/I4210090411 |
| authorships[2].affiliations[0].raw_affiliation_string | Google DeepMind |
| authorships[2].institutions[0].id | https://openalex.org/I4210090411 |
| authorships[2].institutions[0].ror | https://ror.org/00971b260 |
| authorships[2].institutions[0].type | company |
| authorships[2].institutions[0].lineage | https://openalex.org/I4210090411, https://openalex.org/I4210128969 |
| authorships[2].institutions[0].country_code | GB |
| authorships[2].institutions[0].display_name | DeepMind (United Kingdom) |
| authorships[2].institutions[1].id | https://openalex.org/I1291425158 |
| authorships[2].institutions[1].ror | https://ror.org/00njsd438 |
| authorships[2].institutions[1].type | company |
| authorships[2].institutions[1].lineage | https://openalex.org/I1291425158, https://openalex.org/I4210128969 |
| authorships[2].institutions[1].country_code | US |
| authorships[2].institutions[1].display_name | Google (United States) |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Olivier Pietquin |
| authorships[2].is_corresponding | False |
| authorships[2].raw_affiliation_strings | Google DeepMind |
| authorships[3].author.id | https://openalex.org/A5049659586 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Marc Lanctot |
| authorships[3].countries | GB, US |
| authorships[3].affiliations[0].institution_ids | https://openalex.org/I1291425158, https://openalex.org/I4210090411 |
| authorships[3].affiliations[0].raw_affiliation_string | Google DeepMind |
| authorships[3].institutions[0].id | https://openalex.org/I4210090411 |
| authorships[3].institutions[0].ror | https://ror.org/00971b260 |
| authorships[3].institutions[0].type | company |
| authorships[3].institutions[0].lineage | https://openalex.org/I4210090411, https://openalex.org/I4210128969 |
| authorships[3].institutions[0].country_code | GB |
| authorships[3].institutions[0].display_name | DeepMind (United Kingdom) |
| authorships[3].institutions[1].id | https://openalex.org/I1291425158 |
| authorships[3].institutions[1].ror | https://ror.org/00njsd438 |
| authorships[3].institutions[1].type | company |
| authorships[3].institutions[1].lineage | https://openalex.org/I1291425158, https://openalex.org/I4210128969 |
| authorships[3].institutions[1].country_code | US |
| authorships[3].institutions[1].display_name | Google (United States) |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Marc Lanctot |
| authorships[3].is_corresponding | False |
| authorships[3].raw_affiliation_strings | Google DeepMind |
| authorships[4].author.id | https://openalex.org/A5081322018 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-2961-8782 |
| authorships[4].author.display_name | Tom Schaul |
| authorships[4].countries | GB, US |
| authorships[4].affiliations[0].institution_ids | https://openalex.org/I1291425158, https://openalex.org/I4210090411 |
| authorships[4].affiliations[0].raw_affiliation_string | Google DeepMind |
| authorships[4].institutions[0].id | https://openalex.org/I4210090411 |
| authorships[4].institutions[0].ror | https://ror.org/00971b260 |
| authorships[4].institutions[0].type | company |
| authorships[4].institutions[0].lineage | https://openalex.org/I4210090411, https://openalex.org/I4210128969 |
| authorships[4].institutions[0].country_code | GB |
| authorships[4].institutions[0].display_name | DeepMind (United Kingdom) |
| authorships[4].institutions[1].id | https://openalex.org/I1291425158 |
| authorships[4].institutions[1].ror | https://ror.org/00njsd438 |
| authorships[4].institutions[1].type | company |
| authorships[4].institutions[1].lineage | https://openalex.org/I1291425158, https://openalex.org/I4210128969 |
| authorships[4].institutions[1].country_code | US |
| authorships[4].institutions[1].display_name | Google (United States) |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Tom Schaul |
| authorships[4].is_corresponding | False |
| authorships[4].raw_affiliation_strings | Google DeepMind |
| authorships[5].author.id | https://openalex.org/A5103033215 |
| authorships[5].author.orcid | https://orcid.org/0000-0003-3906-950X |
| authorships[5].author.display_name | Bilal Piot |
| authorships[5].countries | GB, US |
| authorships[5].affiliations[0].institution_ids | https://openalex.org/I1291425158, https://openalex.org/I4210090411 |
| authorships[5].affiliations[0].raw_affiliation_string | Google DeepMind |
| authorships[5].institutions[0].id | https://openalex.org/I4210090411 |
| authorships[5].institutions[0].ror | https://ror.org/00971b260 |
| authorships[5].institutions[0].type | company |
| authorships[5].institutions[0].lineage | https://openalex.org/I4210090411, https://openalex.org/I4210128969 |
| authorships[5].institutions[0].country_code | GB |
| authorships[5].institutions[0].display_name | DeepMind (United Kingdom) |
| authorships[5].institutions[1].id | https://openalex.org/I1291425158 |
| authorships[5].institutions[1].ror | https://ror.org/00njsd438 |
| authorships[5].institutions[1].type | company |
| authorships[5].institutions[1].lineage | https://openalex.org/I1291425158, https://openalex.org/I4210128969 |
| authorships[5].institutions[1].country_code | US |
| authorships[5].institutions[1].display_name | Google (United States) |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Bilal Piot |
| authorships[5].is_corresponding | False |
| authorships[5].raw_affiliation_strings | Google DeepMind |
| authorships[6].author.id | https://openalex.org/A5030338894 |
| authorships[6].author.orcid | |
| authorships[6].author.display_name | Dan Horgan |
| authorships[6].countries | GB, US |
| authorships[6].affiliations[0].institution_ids | https://openalex.org/I1291425158, https://openalex.org/I4210090411 |
| authorships[6].affiliations[0].raw_affiliation_string | Google DeepMind |
| authorships[6].institutions[0].id | https://openalex.org/I4210090411 |
| authorships[6].institutions[0].ror | https://ror.org/00971b260 |
| authorships[6].institutions[0].type | company |
| authorships[6].institutions[0].lineage | https://openalex.org/I4210090411, https://openalex.org/I4210128969 |
| authorships[6].institutions[0].country_code | GB |
| authorships[6].institutions[0].display_name | DeepMind (United Kingdom) |
| authorships[6].institutions[1].id | https://openalex.org/I1291425158 |
| authorships[6].institutions[1].ror | https://ror.org/00njsd438 |
| authorships[6].institutions[1].type | company |
| authorships[6].institutions[1].lineage | https://openalex.org/I1291425158, https://openalex.org/I4210128969 |
| authorships[6].institutions[1].country_code | US |
| authorships[6].institutions[1].display_name | Google (United States) |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Dan Horgan |
| authorships[6].is_corresponding | False |
| authorships[6].raw_affiliation_strings | Google DeepMind |
| authorships[7].author.id | https://openalex.org/A5018191427 |
| authorships[7].author.orcid | |
| authorships[7].author.display_name | John Quan |
| authorships[7].countries | GB, US |
| authorships[7].affiliations[0].institution_ids | https://openalex.org/I1291425158, https://openalex.org/I4210090411 |
| authorships[7].affiliations[0].raw_affiliation_string | Google DeepMind |
| authorships[7].institutions[0].id | https://openalex.org/I4210090411 |
| authorships[7].institutions[0].ror | https://ror.org/00971b260 |
| authorships[7].institutions[0].type | company |
| authorships[7].institutions[0].lineage | https://openalex.org/I4210090411, https://openalex.org/I4210128969 |
| authorships[7].institutions[0].country_code | GB |
| authorships[7].institutions[0].display_name | DeepMind (United Kingdom) |
| authorships[7].institutions[1].id | https://openalex.org/I1291425158 |
| authorships[7].institutions[1].ror | https://ror.org/00njsd438 |
| authorships[7].institutions[1].type | company |
| authorships[7].institutions[1].lineage | https://openalex.org/I1291425158, https://openalex.org/I4210128969 |
| authorships[7].institutions[1].country_code | US |
| authorships[7].institutions[1].display_name | Google (United States) |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | John Quan |
| authorships[7].is_corresponding | False |
| authorships[7].raw_affiliation_strings | Google DeepMind |
| authorships[8].author.id | https://openalex.org/A5028929445 |
| authorships[8].author.orcid | |
| authorships[8].author.display_name | Andrew Sendonaris |
| authorships[8].countries | GB, US |
| authorships[8].affiliations[0].institution_ids | https://openalex.org/I1291425158, https://openalex.org/I4210090411 |
| authorships[8].affiliations[0].raw_affiliation_string | Google DeepMind |
| authorships[8].institutions[0].id | https://openalex.org/I4210090411 |
| authorships[8].institutions[0].ror | https://ror.org/00971b260 |
| authorships[8].institutions[0].type | company |
| authorships[8].institutions[0].lineage | https://openalex.org/I4210090411, https://openalex.org/I4210128969 |
| authorships[8].institutions[0].country_code | GB |
| authorships[8].institutions[0].display_name | DeepMind (United Kingdom) |
| authorships[8].institutions[1].id | https://openalex.org/I1291425158 |
| authorships[8].institutions[1].ror | https://ror.org/00njsd438 |
| authorships[8].institutions[1].type | company |
| authorships[8].institutions[1].lineage | https://openalex.org/I1291425158, https://openalex.org/I4210128969 |
| authorships[8].institutions[1].country_code | US |
| authorships[8].institutions[1].display_name | Google (United States) |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Andrew Sendonaris |
| authorships[8].is_corresponding | False |
| authorships[8].raw_affiliation_strings | Google DeepMind |
| authorships[9].author.id | https://openalex.org/A5015899120 |
| authorships[9].author.orcid | |
| authorships[9].author.display_name | Ian Osband |
| authorships[9].countries | GB, US |
| authorships[9].affiliations[0].institution_ids | https://openalex.org/I1291425158, https://openalex.org/I4210090411 |
| authorships[9].affiliations[0].raw_affiliation_string | Google DeepMind |
| authorships[9].institutions[0].id | https://openalex.org/I4210090411 |
| authorships[9].institutions[0].ror | https://ror.org/00971b260 |
| authorships[9].institutions[0].type | company |
| authorships[9].institutions[0].lineage | https://openalex.org/I4210090411, https://openalex.org/I4210128969 |
| authorships[9].institutions[0].country_code | GB |
| authorships[9].institutions[0].display_name | DeepMind (United Kingdom) |
| authorships[9].institutions[1].id | https://openalex.org/I1291425158 |
| authorships[9].institutions[1].ror | https://ror.org/00njsd438 |
| authorships[9].institutions[1].type | company |
| authorships[9].institutions[1].lineage | https://openalex.org/I1291425158, https://openalex.org/I4210128969 |
| authorships[9].institutions[1].country_code | US |
| authorships[9].institutions[1].display_name | Google (United States) |
| authorships[9].author_position | middle |
| authorships[9].raw_author_name | Ian Osband |
| authorships[9].is_corresponding | False |
| authorships[9].raw_affiliation_strings | Google DeepMind |
| authorships[10].author.id | https://openalex.org/A5008880429 |
| authorships[10].author.orcid | |
| authorships[10].author.display_name | Gabriel Dulac-Arnold |
| authorships[10].countries | GB, US |
| authorships[10].affiliations[0].institution_ids | https://openalex.org/I1291425158, https://openalex.org/I4210090411 |
| authorships[10].affiliations[0].raw_affiliation_string | Google DeepMind |
| authorships[10].institutions[0].id | https://openalex.org/I4210090411 |
| authorships[10].institutions[0].ror | https://ror.org/00971b260 |
| authorships[10].institutions[0].type | company |
| authorships[10].institutions[0].lineage | https://openalex.org/I4210090411, https://openalex.org/I4210128969 |
| authorships[10].institutions[0].country_code | GB |
| authorships[10].institutions[0].display_name | DeepMind (United Kingdom) |
| authorships[10].institutions[1].id | https://openalex.org/I1291425158 |
| authorships[10].institutions[1].ror | https://ror.org/00njsd438 |
| authorships[10].institutions[1].type | company |
| authorships[10].institutions[1].lineage | https://openalex.org/I1291425158, https://openalex.org/I4210128969 |
| authorships[10].institutions[1].country_code | US |
| authorships[10].institutions[1].display_name | Google (United States) |
| authorships[10].author_position | middle |
| authorships[10].raw_author_name | Gabriel Dulac-Arnold |
| authorships[10].is_corresponding | False |
| authorships[10].raw_affiliation_strings | Google DeepMind |
| authorships[11].author.id | https://openalex.org/A5017056095 |
| authorships[11].author.orcid | https://orcid.org/0000-0003-2642-2845 |
| authorships[11].author.display_name | John Agapiou |
| authorships[11].countries | GB, US |
| authorships[11].affiliations[0].institution_ids | https://openalex.org/I1291425158, https://openalex.org/I4210090411 |
| authorships[11].affiliations[0].raw_affiliation_string | Google DeepMind |
| authorships[11].institutions[0].id | https://openalex.org/I4210090411 |
| authorships[11].institutions[0].ror | https://ror.org/00971b260 |
| authorships[11].institutions[0].type | company |
| authorships[11].institutions[0].lineage | https://openalex.org/I4210090411, https://openalex.org/I4210128969 |
| authorships[11].institutions[0].country_code | GB |
| authorships[11].institutions[0].display_name | DeepMind (United Kingdom) |
| authorships[11].institutions[1].id | https://openalex.org/I1291425158 |
| authorships[11].institutions[1].ror | https://ror.org/00njsd438 |
| authorships[11].institutions[1].type | company |
| authorships[11].institutions[1].lineage | https://openalex.org/I1291425158, https://openalex.org/I4210128969 |
| authorships[11].institutions[1].country_code | US |
| authorships[11].institutions[1].display_name | Google (United States) |
| authorships[11].author_position | middle |
| authorships[11].raw_author_name | John Agapiou |
| authorships[11].is_corresponding | False |
| authorships[11].raw_affiliation_strings | Google DeepMind |
| authorships[12].author.id | https://openalex.org/A5054808675 |
| authorships[12].author.orcid | https://orcid.org/0000-0002-3153-916X |
| authorships[12].author.display_name | Joel Z. Leibo |
| authorships[12].countries | GB, US |
| authorships[12].affiliations[0].institution_ids | https://openalex.org/I1291425158, https://openalex.org/I4210090411 |
| authorships[12].affiliations[0].raw_affiliation_string | Google DeepMind |
| authorships[12].institutions[0].id | https://openalex.org/I4210090411 |
| authorships[12].institutions[0].ror | https://ror.org/00971b260 |
| authorships[12].institutions[0].type | company |
| authorships[12].institutions[0].lineage | https://openalex.org/I4210090411, https://openalex.org/I4210128969 |
| authorships[12].institutions[0].country_code | GB |
| authorships[12].institutions[0].display_name | DeepMind (United Kingdom) |
| authorships[12].institutions[1].id | https://openalex.org/I1291425158 |
| authorships[12].institutions[1].ror | https://ror.org/00njsd438 |
| authorships[12].institutions[1].type | company |
| authorships[12].institutions[1].lineage | https://openalex.org/I1291425158, https://openalex.org/I4210128969 |
| authorships[12].institutions[1].country_code | US |
| authorships[12].institutions[1].display_name | Google (United States) |
| authorships[12].author_position | middle |
| authorships[12].raw_author_name | Joel Leibo |
| authorships[12].is_corresponding | False |
| authorships[12].raw_affiliation_strings | Google DeepMind |
| authorships[13].author.id | https://openalex.org/A5040179074 |
| authorships[13].author.orcid | |
| authorships[13].author.display_name | Audrūnas Gruslys |
| authorships[13].countries | GB, US |
| authorships[13].affiliations[0].institution_ids | https://openalex.org/I1291425158, https://openalex.org/I4210090411 |
| authorships[13].affiliations[0].raw_affiliation_string | Google DeepMind |
| authorships[13].institutions[0].id | https://openalex.org/I4210090411 |
| authorships[13].institutions[0].ror | https://ror.org/00971b260 |
| authorships[13].institutions[0].type | company |
| authorships[13].institutions[0].lineage | https://openalex.org/I4210090411, https://openalex.org/I4210128969 |
| authorships[13].institutions[0].country_code | GB |
| authorships[13].institutions[0].display_name | DeepMind (United Kingdom) |
| authorships[13].institutions[1].id | https://openalex.org/I1291425158 |
| authorships[13].institutions[1].ror | https://ror.org/00njsd438 |
| authorships[13].institutions[1].type | company |
| authorships[13].institutions[1].lineage | https://openalex.org/I1291425158, https://openalex.org/I4210128969 |
| authorships[13].institutions[1].country_code | US |
| authorships[13].institutions[1].display_name | Google (United States) |
| authorships[13].author_position | last |
| authorships[13].raw_author_name | Audrunas Gruslys |
| authorships[13].is_corresponding | False |
| authorships[13].raw_affiliation_strings | Google DeepMind |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://ojs.aaai.org/index.php/AAAI/article/download/11757/11616 |
| open_access.oa_status | diamond |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Deep Q-learning From Demonstrations |
| has_fulltext | True |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T10462 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9997000098228455 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Reinforcement Learning in Robotics |
| related_works | https://openalex.org/W4362501864, https://openalex.org/W4306904969, https://openalex.org/W4380318855, https://openalex.org/W2138720691, https://openalex.org/W2031695474, https://openalex.org/W2586732548, https://openalex.org/W3049728571, https://openalex.org/W20361778, https://openalex.org/W2024136090, https://openalex.org/W4380075502 |
| cited_by_count | 766 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 76 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 107 |
| counts_by_year[2].year | 2023 |
| counts_by_year[2].cited_by_count | 150 |
| counts_by_year[3].year | 2022 |
| counts_by_year[3].cited_by_count | 126 |
| counts_by_year[4].year | 2021 |
| counts_by_year[4].cited_by_count | 110 |
| counts_by_year[5].year | 2020 |
| counts_by_year[5].cited_by_count | 96 |
| counts_by_year[6].year | 2019 |
| counts_by_year[6].cited_by_count | 58 |
| counts_by_year[7].year | 2018 |
| counts_by_year[7].cited_by_count | 35 |
| counts_by_year[8].year | 2017 |
| counts_by_year[8].cited_by_count | 8 |
| locations_count | 1 |
| best_oa_location.id | doi:10.1609/aaai.v32i1.11757 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4210191458 |
| best_oa_location.source.issn | 2159-5399, 2374-3468 |
| best_oa_location.source.type | conference |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | 2159-5399 |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| best_oa_location.source.host_organization | https://openalex.org/P4310320058 |
| best_oa_location.source.host_organization_name | Association for the Advancement of Artificial Intelligence |
| best_oa_location.source.host_organization_lineage | https://openalex.org/P4310320058 |
| best_oa_location.source.host_organization_lineage_names | Association for the Advancement of Artificial Intelligence |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://ojs.aaai.org/index.php/AAAI/article/download/11757/11616 |
| best_oa_location.version | publishedVersion |
| best_oa_location.raw_type | journal-article |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | True |
| best_oa_location.raw_source_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| best_oa_location.landing_page_url | https://doi.org/10.1609/aaai.v32i1.11757 |
| primary_location.id | doi:10.1609/aaai.v32i1.11757 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4210191458 |
| primary_location.source.issn | 2159-5399, 2374-3468 |
| primary_location.source.type | conference |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | 2159-5399 |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| primary_location.source.host_organization | https://openalex.org/P4310320058 |
| primary_location.source.host_organization_name | Association for the Advancement of Artificial Intelligence |
| primary_location.source.host_organization_lineage | https://openalex.org/P4310320058 |
| primary_location.source.host_organization_lineage_names | Association for the Advancement of Artificial Intelligence |
| primary_location.license | |
| primary_location.pdf_url | https://ojs.aaai.org/index.php/AAAI/article/download/11757/11616 |
| primary_location.version | publishedVersion |
| primary_location.raw_type | journal-article |
| primary_location.license_id | |
| primary_location.is_accepted | True |
| primary_location.is_published | True |
| primary_location.raw_source_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| primary_location.landing_page_url | https://doi.org/10.1609/aaai.v32i1.11757 |
| publication_date | 2018-04-29 |
| publication_year | 2018 |
| referenced_works | https://openalex.org/W2169209873, https://openalex.org/W2952509347, https://openalex.org/W2397581010, https://openalex.org/W6696265566, https://openalex.org/W6691230391, https://openalex.org/W6735944222, https://openalex.org/W2290104316, https://openalex.org/W2612610049, https://openalex.org/W6718092244, https://openalex.org/W2481567506, https://openalex.org/W2551887912, https://openalex.org/W6679700999, https://openalex.org/W6760385162, https://openalex.org/W2145339207, https://openalex.org/W2260756217, https://openalex.org/W2596982695, https://openalex.org/W106792269, https://openalex.org/W2396161314, https://openalex.org/W1931877416, https://openalex.org/W6687681856, https://openalex.org/W6630221451, https://openalex.org/W2257979135, https://openalex.org/W2491675558, https://openalex.org/W2594640072, https://openalex.org/W6631026904, https://openalex.org/W6676728370, https://openalex.org/W2102847492, https://openalex.org/W2137375617, https://openalex.org/W2509374375, https://openalex.org/W2155968351, https://openalex.org/W2173564293, https://openalex.org/W2963430173, https://openalex.org/W2963477884, https://openalex.org/W2148112459, https://openalex.org/W2963094133, https://openalex.org/W2507592741, https://openalex.org/W2061562262, https://openalex.org/W2138108551, https://openalex.org/W4312558117, https://openalex.org/W4299563772, https://openalex.org/W2148051740, https://openalex.org/W2964043796, https://openalex.org/W2962957031, https://openalex.org/W2963211300, https://openalex.org/W2950872548, https://openalex.org/W2919115771, https://openalex.org/W2963277051, https://openalex.org/W2415726935, https://openalex.org/W834081922, https://openalex.org/W2951799221, https://openalex.org/W3103780890, https://openalex.org/W2964161785, https://openalex.org/W1515851193, https://openalex.org/W2950735232, https://openalex.org/W2113023245, https://openalex.org/W2434014514, https://openalex.org/W1999874108, https://openalex.org/W2963160877, https://openalex.org/W2607198029, https://openalex.org/W2201581102, https://openalex.org/W2181849516, https://openalex.org/W2290053245, https://openalex.org/W2601322194, https://openalex.org/W2253157232, https://openalex.org/W2133552775, https://openalex.org/W2746553466 |
| referenced_works_count | 66 |
| abstract_inverted_index.a | 19, 44, 73, 133 |
| abstract_inverted_index.11 | 223 |
| abstract_inverted_index.14 | 208 |
| abstract_inverted_index.41 | 179 |
| abstract_inverted_index.42 | 181, 210 |
| abstract_inverted_index.83 | 190 |
| abstract_inverted_index.In | 29, 68, 212 |
| abstract_inverted_index.RL | 54 |
| abstract_inverted_index.We | 87, 151 |
| abstract_inverted_index.an | 89 |
| abstract_inverted_index.as | 167 |
| abstract_inverted_index.be | 36, 41 |
| abstract_inverted_index.by | 139 |
| abstract_inverted_index.in | 10, 64, 207 |
| abstract_inverted_index.is | 118 |
| abstract_inverted_index.it | 47, 168, 186 |
| abstract_inverted_index.of | 22, 52, 84, 100, 114, 126, 147, 180, 209 |
| abstract_inverted_index.on | 173, 178, 184 |
| abstract_inverted_index.to | 55, 103, 120, 132, 193, 196, 201, 218 |
| abstract_inverted_index.up | 195 |
| abstract_inverted_index.we | 71, 226 |
| abstract_inverted_index.DQN | 189 |
| abstract_inverted_index.PDD | 188 |
| abstract_inverted_index.and | 117, 183 |
| abstract_inverted_index.but | 46 |
| abstract_inverted_index.can | 35 |
| abstract_inverted_index.for | 43, 222, 236 |
| abstract_inverted_index.has | 4, 155 |
| abstract_inverted_index.may | 40, 78 |
| abstract_inverted_index.the | 50, 60, 65, 76, 85, 106, 123, 148, 174, 203 |
| abstract_inverted_index.(PDD | 165 |
| abstract_inverted_index.(RL) | 3 |
| abstract_inverted_index.DQN) | 166 |
| abstract_inverted_index.DQN. | 241 |
| abstract_inverted_index.DQfD | 137, 154, 199, 214, 229 |
| abstract_inverted_index.Deep | 0, 91, 163 |
| abstract_inverted_index.This | 39 |
| abstract_inverted_index.able | 119 |
| abstract_inverted_index.best | 204 |
| abstract_inverted_index.data | 23, 80, 102, 116, 128, 239 |
| abstract_inverted_index.deep | 53 |
| abstract_inverted_index.even | 109 |
| abstract_inverted_index.from | 81, 93, 110 |
| abstract_inverted_index.high | 7 |
| abstract_inverted_index.huge | 20 |
| abstract_inverted_index.into | 240 |
| abstract_inverted_index.many | 56 |
| abstract_inverted_index.must | 62 |
| abstract_inverted_index.real | 66 |
| abstract_inverted_index.sets | 99 |
| abstract_inverted_index.show | 152, 227 |
| abstract_inverted_index.than | 159, 232 |
| abstract_inverted_index.that | 96, 153, 228 |
| abstract_inverted_index.they | 25 |
| abstract_inverted_index.this | 69 |
| abstract_inverted_index.with | 144, 170 |
| abstract_inverted_index.agent | 61, 77 |
| abstract_inverted_index.catch | 194 |
| abstract_inverted_index.fact, | 30 |
| abstract_inverted_index.first | 175 |
| abstract_inverted_index.games | 182 |
| abstract_inverted_index.given | 206 |
| abstract_inverted_index.human | 216 |
| abstract_inverted_index.learn | 63 |
| abstract_inverted_index.paper | 70 |
| abstract_inverted_index.poor. | 38 |
| abstract_inverted_index.ratio | 125 |
| abstract_inverted_index.reach | 26 |
| abstract_inverted_index.small | 98, 112 |
| abstract_inverted_index.steps | 177, 192 |
| abstract_inverted_index.study | 72 |
| abstract_inverted_index.takes | 187 |
| abstract_inverted_index.their | 31 |
| abstract_inverted_index.these | 15 |
| abstract_inverted_index.three | 233 |
| abstract_inverted_index.where | 59, 75 |
| abstract_inverted_index.while | 129 |
| abstract_inverted_index.works | 138 |
| abstract_inverted_index.Double | 162 |
| abstract_inverted_index.access | 79 |
| abstract_inverted_index.amount | 21 |
| abstract_inverted_index.assess | 122 |
| abstract_inverted_index.before | 24 |
| abstract_inverted_index.better | 156, 171, 231 |
| abstract_inverted_index.during | 33 |
| abstract_inverted_index.games. | 211, 224 |
| abstract_inverted_index.learns | 200 |
| abstract_inverted_index.limits | 49 |
| abstract_inverted_index.replay | 135 |
| abstract_inverted_index.scores | 172 |
| abstract_inverted_index.starts | 169 |
| abstract_inverted_index.tasks, | 58 |
| abstract_inverted_index.thanks | 131 |
| abstract_inverted_index.(DQfD), | 95 |
| abstract_inverted_index.Dueling | 161 |
| abstract_inverted_index.achieve | 219 |
| abstract_inverted_index.amounts | 113 |
| abstract_inverted_index.average | 185 |
| abstract_inverted_index.control | 83 |
| abstract_inverted_index.initial | 157 |
| abstract_inverted_index.million | 176, 191 |
| abstract_inverted_index.present | 88 |
| abstract_inverted_index.process | 108 |
| abstract_inverted_index.profile | 8 |
| abstract_inverted_index.related | 234 |
| abstract_inverted_index.require | 18 |
| abstract_inverted_index.results | 221 |
| abstract_inverted_index.setting | 74 |
| abstract_inverted_index.several | 6 |
| abstract_inverted_index.system. | 86 |
| abstract_inverted_index.updates | 143 |
| abstract_inverted_index.DQfD’s | 197 |
| abstract_inverted_index.Finally, | 225 |
| abstract_inverted_index.However, | 14 |
| abstract_inverted_index.achieved | 5 |
| abstract_inverted_index.actions. | 150 |
| abstract_inverted_index.learning | 2, 34, 107, 130 |
| abstract_inverted_index.performs | 230 |
| abstract_inverted_index.previous | 82 |
| abstract_inverted_index.severely | 48 |
| abstract_inverted_index.temporal | 141 |
| abstract_inverted_index.addition, | 213 |
| abstract_inverted_index.combining | 140 |
| abstract_inverted_index.difficult | 11 |
| abstract_inverted_index.extremely | 37 |
| abstract_inverted_index.leverages | 97, 215 |
| abstract_inverted_index.massively | 104 |
| abstract_inverted_index.necessary | 124 |
| abstract_inverted_index.problems. | 13 |
| abstract_inverted_index.successes | 9 |
| abstract_inverted_index.typically | 17 |
| abstract_inverted_index.Q-Networks | 164 |
| abstract_inverted_index.Q-learning | 92 |
| abstract_inverted_index.accelerate | 105 |
| abstract_inverted_index.acceptable | 42 |
| abstract_inverted_index.algorithm, | 90 |
| abstract_inverted_index.algorithms | 16, 235 |
| abstract_inverted_index.difference | 142 |
| abstract_inverted_index.mechanism. | 136 |
| abstract_inverted_index.real-world | 57 |
| abstract_inverted_index.reasonable | 27 |
| abstract_inverted_index.relatively | 111 |
| abstract_inverted_index.simulator, | 45 |
| abstract_inverted_index.supervised | 145 |
| abstract_inverted_index.Prioritized | 160 |
| abstract_inverted_index.out-perform | 202 |
| abstract_inverted_index.performance | 32, 158 |
| abstract_inverted_index.prioritized | 134 |
| abstract_inverted_index.environment. | 67 |
| abstract_inverted_index.performance. | 28, 198 |
| abstract_inverted_index.applicability | 51 |
| abstract_inverted_index.automatically | 121 |
| abstract_inverted_index.demonstration | 101, 115, 127, 205, 238 |
| abstract_inverted_index.incorporating | 237 |
| abstract_inverted_index.reinforcement | 1 |
| abstract_inverted_index.Demonstrations | 94 |
| abstract_inverted_index.classification | 146 |
| abstract_inverted_index.demonstrations | 217 |
| abstract_inverted_index.decision-making | 12 |
| abstract_inverted_index.demonstrator’s | 149 |
| abstract_inverted_index.state-of-the-art | 220 |
| cited_by_percentile_year.max | 100 |
| cited_by_percentile_year.min | 99 |
| countries_distinct_count | 2 |
| institutions_distinct_count | 14 |
| citation_normalized_percentile.value | 0.99336067 |
| citation_normalized_percentile.is_in_top_1_percent | True |
| citation_normalized_percentile.is_in_top_10_percent | True |