Learning When and What to Ask: a Hierarchical Reinforcement Learning Framework. Article Swipe
Reliable AI agents should be mindful of the limits of their knowledge and consult humans when sensing that they do not have sufficient knowledge to make sound decisions. We formulate a hierarchical reinforcement learning framework for learning to decide when to request additional information from humans and what type of information would be helpful to request. Our framework extends partially-observed Markov decision processes (POMDPs) by allowing an agent to interact with an assistant to leverage their knowledge in accomplishing tasks. Results on a simulated human-assisted navigation problem demonstrate the effectiveness of our framework: aided with an interaction policy learned by our method, a navigation policy achieves up to a 7x improvement in task success rate compared to performing tasks only by itself. The interaction policy is also efficient: on average, only a quarter of all actions taken during a task execution are requests for information. We analyze benefits and challenges of learning with a hierarchical policy structure and suggest directions for future work.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://export.arxiv.org/pdf/2110.08258
- OA Status
- green
- References
- 31
- Related Works
- 20
- OpenAlex ID
- https://openalex.org/W3207011877
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W3207011877Canonical identifier for this work in OpenAlex
- Title
-
Learning When and What to Ask: a Hierarchical Reinforcement Learning Framework.Work title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2021Year of publication
- Publication date
-
2021-10-14Full publication date if available
- Authors
-
Khanh Nguyen, Yonatan Bisk, Hal DauméList of authors in order
- Landing page
-
https://export.arxiv.org/pdf/2110.08258Publisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://export.arxiv.org/pdf/2110.08258Direct OA link when available
- Concepts
-
Reinforcement learning, Leverage (statistics), Computer science, Markov decision process, Task (project management), Ask price, Artificial intelligence, Machine learning, Human–computer interaction, Markov process, Management, Economy, Statistics, Economics, MathematicsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- References (count)
-
31Number of works referenced by this work
- Related works (count)
-
20Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W3207011877 |
|---|---|
| doi | |
| ids.mag | 3207011877 |
| ids.openalex | https://openalex.org/W3207011877 |
| fwci | |
| type | preprint |
| title | Learning When and What to Ask: a Hierarchical Reinforcement Learning Framework. |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10462 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9997000098228455 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Reinforcement Learning in Robotics |
| topics[1].id | https://openalex.org/T11704 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9988999962806702 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1706 |
| topics[1].subfield.display_name | Computer Science Applications |
| topics[1].display_name | Mobile Crowdsensing and Crowdsourcing |
| topics[2].id | https://openalex.org/T11714 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9987000226974487 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1707 |
| topics[2].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[2].display_name | Multimodal Machine Learning Applications |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C97541855 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8567795753479004 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q830687 |
| concepts[0].display_name | Reinforcement learning |
| concepts[1].id | https://openalex.org/C153083717 |
| concepts[1].level | 2 |
| concepts[1].score | 0.792235255241394 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q6535263 |
| concepts[1].display_name | Leverage (statistics) |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.7453216314315796 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C106189395 |
| concepts[3].level | 3 |
| concepts[3].score | 0.6558366417884827 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q176789 |
| concepts[3].display_name | Markov decision process |
| concepts[4].id | https://openalex.org/C2780451532 |
| concepts[4].level | 2 |
| concepts[4].score | 0.6461279392242432 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q759676 |
| concepts[4].display_name | Task (project management) |
| concepts[5].id | https://openalex.org/C90329073 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5077248811721802 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q914232 |
| concepts[5].display_name | Ask price |
| concepts[6].id | https://openalex.org/C154945302 |
| concepts[6].level | 1 |
| concepts[6].score | 0.45667171478271484 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[6].display_name | Artificial intelligence |
| concepts[7].id | https://openalex.org/C119857082 |
| concepts[7].level | 1 |
| concepts[7].score | 0.3872433304786682 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[7].display_name | Machine learning |
| concepts[8].id | https://openalex.org/C107457646 |
| concepts[8].level | 1 |
| concepts[8].score | 0.3869580030441284 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q207434 |
| concepts[8].display_name | Human–computer interaction |
| concepts[9].id | https://openalex.org/C159886148 |
| concepts[9].level | 2 |
| concepts[9].score | 0.3014349639415741 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q176645 |
| concepts[9].display_name | Markov process |
| concepts[10].id | https://openalex.org/C187736073 |
| concepts[10].level | 1 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q2920921 |
| concepts[10].display_name | Management |
| concepts[11].id | https://openalex.org/C136264566 |
| concepts[11].level | 1 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q159810 |
| concepts[11].display_name | Economy |
| concepts[12].id | https://openalex.org/C105795698 |
| concepts[12].level | 1 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q12483 |
| concepts[12].display_name | Statistics |
| concepts[13].id | https://openalex.org/C162324750 |
| concepts[13].level | 0 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q8134 |
| concepts[13].display_name | Economics |
| concepts[14].id | https://openalex.org/C33923547 |
| concepts[14].level | 0 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[14].display_name | Mathematics |
| keywords[0].id | https://openalex.org/keywords/reinforcement-learning |
| keywords[0].score | 0.8567795753479004 |
| keywords[0].display_name | Reinforcement learning |
| keywords[1].id | https://openalex.org/keywords/leverage |
| keywords[1].score | 0.792235255241394 |
| keywords[1].display_name | Leverage (statistics) |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.7453216314315796 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/markov-decision-process |
| keywords[3].score | 0.6558366417884827 |
| keywords[3].display_name | Markov decision process |
| keywords[4].id | https://openalex.org/keywords/task |
| keywords[4].score | 0.6461279392242432 |
| keywords[4].display_name | Task (project management) |
| keywords[5].id | https://openalex.org/keywords/ask-price |
| keywords[5].score | 0.5077248811721802 |
| keywords[5].display_name | Ask price |
| keywords[6].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[6].score | 0.45667171478271484 |
| keywords[6].display_name | Artificial intelligence |
| keywords[7].id | https://openalex.org/keywords/machine-learning |
| keywords[7].score | 0.3872433304786682 |
| keywords[7].display_name | Machine learning |
| keywords[8].id | https://openalex.org/keywords/human–computer-interaction |
| keywords[8].score | 0.3869580030441284 |
| keywords[8].display_name | Human–computer interaction |
| keywords[9].id | https://openalex.org/keywords/markov-process |
| keywords[9].score | 0.3014349639415741 |
| keywords[9].display_name | Markov process |
| language | en |
| locations[0].id | mag:3207011877 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | |
| locations[0].version | submittedVersion |
| locations[0].raw_type | |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | arXiv (Cornell University) |
| locations[0].landing_page_url | http://export.arxiv.org/pdf/2110.08258 |
| authorships[0].author.id | https://openalex.org/A5101639120 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-0684-9406 |
| authorships[0].author.display_name | Khanh Nguyen |
| authorships[0].countries | US |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I66946132 |
| authorships[0].affiliations[0].raw_affiliation_string | University of Maryland - College Park |
| authorships[0].institutions[0].id | https://openalex.org/I66946132 |
| authorships[0].institutions[0].ror | https://ror.org/047s2c258 |
| authorships[0].institutions[0].type | education |
| authorships[0].institutions[0].lineage | https://openalex.org/I66946132 |
| authorships[0].institutions[0].country_code | US |
| authorships[0].institutions[0].display_name | University of Maryland, College Park |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Khanh Nguyen |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | University of Maryland - College Park |
| authorships[1].author.id | https://openalex.org/A5041302228 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Yonatan Bisk |
| authorships[1].countries | US |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I74973139 |
| authorships[1].affiliations[0].raw_affiliation_string | Carnegie Mellon University |
| authorships[1].institutions[0].id | https://openalex.org/I74973139 |
| authorships[1].institutions[0].ror | https://ror.org/05x2bcf33 |
| authorships[1].institutions[0].type | education |
| authorships[1].institutions[0].lineage | https://openalex.org/I74973139 |
| authorships[1].institutions[0].country_code | US |
| authorships[1].institutions[0].display_name | Carnegie Mellon University |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Yonatan Bisk |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | Carnegie Mellon University |
| authorships[2].author.id | https://openalex.org/A5019928111 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-3760-345X |
| authorships[2].author.display_name | Hal Daumé |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Hal Daumé |
| authorships[2].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | http://export.arxiv.org/pdf/2110.08258 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Learning When and What to Ask: a Hierarchical Reinforcement Learning Framework. |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-10-10T17:16:08.811792 |
| primary_topic.id | https://openalex.org/T10462 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9997000098228455 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Reinforcement Learning in Robotics |
| related_works | https://openalex.org/W2780057514, https://openalex.org/W3106528330, https://openalex.org/W3207649250, https://openalex.org/W1247499718, https://openalex.org/W3038582969, https://openalex.org/W2952157924, https://openalex.org/W2916112073, https://openalex.org/W2251007294, https://openalex.org/W3136163870, https://openalex.org/W2808007596, https://openalex.org/W2996784529, https://openalex.org/W2974316376, https://openalex.org/W2767057222, https://openalex.org/W9516548, https://openalex.org/W2903849022, https://openalex.org/W2963026102, https://openalex.org/W3015216216, https://openalex.org/W2071453245, https://openalex.org/W3035847478, https://openalex.org/W2909005663 |
| cited_by_count | 0 |
| locations_count | 1 |
| best_oa_location.id | mag:3207011877 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | arXiv (Cornell University) |
| best_oa_location.landing_page_url | http://export.arxiv.org/pdf/2110.08258 |
| primary_location.id | mag:3207011877 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | |
| primary_location.version | submittedVersion |
| primary_location.raw_type | |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | arXiv (Cornell University) |
| primary_location.landing_page_url | http://export.arxiv.org/pdf/2110.08258 |
| publication_date | 2021-10-14 |
| publication_year | 2021 |
| referenced_works | https://openalex.org/W2885138528, https://openalex.org/W2768661419, https://openalex.org/W2168359464, https://openalex.org/W2117598836, https://openalex.org/W2967186499, https://openalex.org/W3103179916, https://openalex.org/W2996486595, https://openalex.org/W3035628711, https://openalex.org/W2998008599, https://openalex.org/W2970340522, https://openalex.org/W2963033005, https://openalex.org/W3023690688, https://openalex.org/W2152748974, https://openalex.org/W1777239053, https://openalex.org/W2964043796, https://openalex.org/W3034758614, https://openalex.org/W2891012317, https://openalex.org/W2963800628, https://openalex.org/W2109910161, https://openalex.org/W2964250417, https://openalex.org/W3121002673, https://openalex.org/W2962938178, https://openalex.org/W2962957031, https://openalex.org/W2963266575, https://openalex.org/W2979532866, https://openalex.org/W2949316785, https://openalex.org/W3189216228, https://openalex.org/W2963403868, https://openalex.org/W3108144224, https://openalex.org/W2963262099, https://openalex.org/W2083195487 |
| referenced_works_count | 31 |
| abstract_inverted_index.a | 30, 82, 102, 108, 131, 138, 153 |
| abstract_inverted_index.7x | 109 |
| abstract_inverted_index.AI | 1 |
| abstract_inverted_index.We | 28, 145 |
| abstract_inverted_index.an | 66, 71, 95 |
| abstract_inverted_index.be | 4, 52 |
| abstract_inverted_index.by | 64, 99, 120 |
| abstract_inverted_index.do | 19 |
| abstract_inverted_index.in | 77, 111 |
| abstract_inverted_index.is | 125 |
| abstract_inverted_index.of | 6, 9, 49, 90, 133, 150 |
| abstract_inverted_index.on | 81, 128 |
| abstract_inverted_index.to | 24, 37, 40, 54, 68, 73, 107, 116 |
| abstract_inverted_index.up | 106 |
| abstract_inverted_index.Our | 56 |
| abstract_inverted_index.The | 122 |
| abstract_inverted_index.all | 134 |
| abstract_inverted_index.and | 12, 46, 148, 157 |
| abstract_inverted_index.are | 141 |
| abstract_inverted_index.for | 35, 143, 160 |
| abstract_inverted_index.not | 20 |
| abstract_inverted_index.our | 91, 100 |
| abstract_inverted_index.the | 7, 88 |
| abstract_inverted_index.also | 126 |
| abstract_inverted_index.from | 44 |
| abstract_inverted_index.have | 21 |
| abstract_inverted_index.make | 25 |
| abstract_inverted_index.only | 119, 130 |
| abstract_inverted_index.rate | 114 |
| abstract_inverted_index.task | 112, 139 |
| abstract_inverted_index.that | 17 |
| abstract_inverted_index.they | 18 |
| abstract_inverted_index.type | 48 |
| abstract_inverted_index.what | 47 |
| abstract_inverted_index.when | 15, 39 |
| abstract_inverted_index.with | 70, 94, 152 |
| abstract_inverted_index.agent | 67 |
| abstract_inverted_index.aided | 93 |
| abstract_inverted_index.sound | 26 |
| abstract_inverted_index.taken | 136 |
| abstract_inverted_index.tasks | 118 |
| abstract_inverted_index.their | 10, 75 |
| abstract_inverted_index.work. | 162 |
| abstract_inverted_index.would | 51 |
| abstract_inverted_index.Markov | 60 |
| abstract_inverted_index.agents | 2 |
| abstract_inverted_index.decide | 38 |
| abstract_inverted_index.during | 137 |
| abstract_inverted_index.future | 161 |
| abstract_inverted_index.humans | 14, 45 |
| abstract_inverted_index.limits | 8 |
| abstract_inverted_index.policy | 97, 104, 124, 155 |
| abstract_inverted_index.should | 3 |
| abstract_inverted_index.tasks. | 79 |
| abstract_inverted_index.Results | 80 |
| abstract_inverted_index.actions | 135 |
| abstract_inverted_index.analyze | 146 |
| abstract_inverted_index.consult | 13 |
| abstract_inverted_index.extends | 58 |
| abstract_inverted_index.helpful | 53 |
| abstract_inverted_index.itself. | 121 |
| abstract_inverted_index.learned | 98 |
| abstract_inverted_index.method, | 101 |
| abstract_inverted_index.mindful | 5 |
| abstract_inverted_index.problem | 86 |
| abstract_inverted_index.quarter | 132 |
| abstract_inverted_index.request | 41 |
| abstract_inverted_index.sensing | 16 |
| abstract_inverted_index.success | 113 |
| abstract_inverted_index.suggest | 158 |
| abstract_inverted_index.(POMDPs) | 63 |
| abstract_inverted_index.Reliable | 0 |
| abstract_inverted_index.achieves | 105 |
| abstract_inverted_index.allowing | 65 |
| abstract_inverted_index.average, | 129 |
| abstract_inverted_index.benefits | 147 |
| abstract_inverted_index.compared | 115 |
| abstract_inverted_index.decision | 61 |
| abstract_inverted_index.interact | 69 |
| abstract_inverted_index.learning | 33, 36, 151 |
| abstract_inverted_index.leverage | 74 |
| abstract_inverted_index.request. | 55 |
| abstract_inverted_index.requests | 142 |
| abstract_inverted_index.assistant | 72 |
| abstract_inverted_index.execution | 140 |
| abstract_inverted_index.formulate | 29 |
| abstract_inverted_index.framework | 34, 57 |
| abstract_inverted_index.knowledge | 11, 23, 76 |
| abstract_inverted_index.processes | 62 |
| abstract_inverted_index.simulated | 83 |
| abstract_inverted_index.structure | 156 |
| abstract_inverted_index.additional | 42 |
| abstract_inverted_index.challenges | 149 |
| abstract_inverted_index.decisions. | 27 |
| abstract_inverted_index.directions | 159 |
| abstract_inverted_index.efficient: | 127 |
| abstract_inverted_index.framework: | 92 |
| abstract_inverted_index.navigation | 85, 103 |
| abstract_inverted_index.performing | 117 |
| abstract_inverted_index.sufficient | 22 |
| abstract_inverted_index.demonstrate | 87 |
| abstract_inverted_index.improvement | 110 |
| abstract_inverted_index.information | 43, 50 |
| abstract_inverted_index.interaction | 96, 123 |
| abstract_inverted_index.hierarchical | 31, 154 |
| abstract_inverted_index.information. | 144 |
| abstract_inverted_index.accomplishing | 78 |
| abstract_inverted_index.effectiveness | 89 |
| abstract_inverted_index.reinforcement | 32 |
| abstract_inverted_index.human-assisted | 84 |
| abstract_inverted_index.partially-observed | 59 |
| cited_by_percentile_year | |
| countries_distinct_count | 1 |
| institutions_distinct_count | 3 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/16 |
| sustainable_development_goals[0].score | 0.75 |
| sustainable_development_goals[0].display_name | Peace, Justice and strong institutions |
| citation_normalized_percentile |