How RL Agents Behave When Their Actions Are Modified Article Swipe
YOU?
·
· 2021
· Open Access
·
· DOI: https://doi.org/10.1609/aaai.v35i13.17378
Reinforcement learning in complex environments may require supervision to prevent the agent from attempting dangerous actions. As a result of supervisor intervention, the executed action may differ from the action specified by the policy. How does this affect learning? We present the Modified-Action Markov Decision Process, an extension of the MDP model that allows actions to differ from the policy. We analyze the asymptotic behaviours of common reinforcement learning algorithms in this setting and show that they adapt in different ways: some completely ignore modifications while others go to various lengths in trying to avoid action modifications that decrease reward. By choosing the right algorithm, developers can prevent their agents from learning to circumvent interruptions or constraints, and better control agent responses to other kinds of action modification, like self-damage.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- https://doi.org/10.1609/aaai.v35i13.17378
- https://ojs.aaai.org/index.php/AAAI/article/download/17378/17185
- OA Status
- diamond
- Cited By
- 5
- References
- 38
- Related Works
- 20
- OpenAlex ID
- https://openalex.org/W3131546278
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W3131546278Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.1609/aaai.v35i13.17378Digital Object Identifier
- Title
-
How RL Agents Behave When Their Actions Are ModifiedWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2021Year of publication
- Publication date
-
2021-05-18Full publication date if available
- Authors
-
Eric Langlois, Tom EverittList of authors in order
- Landing page
-
https://doi.org/10.1609/aaai.v35i13.17378Publisher landing page
- PDF URL
-
https://ojs.aaai.org/index.php/AAAI/article/download/17378/17185Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
diamondOpen access status per OpenAlex
- OA URL
-
https://ojs.aaai.org/index.php/AAAI/article/download/17378/17185Direct OA link when available
- Concepts
-
Reinforcement learning, Action (physics), Supervisor, Markov decision process, Computer science, Process (computing), Q-learning, Reinforcement, Control (management), Risk analysis (engineering), Intervention (counseling), Artificial intelligence, Markov process, Psychology, Business, Social psychology, Mathematics, Political science, Law, Psychiatry, Quantum mechanics, Physics, Statistics, Operating systemTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
5Total citation count in OpenAlex
- Citations by year (recent)
-
2022: 1, 2021: 4Per-year citation counts (last 5 years)
- References (count)
-
38Number of works referenced by this work
- Related works (count)
-
20Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W3131546278 |
|---|---|
| doi | https://doi.org/10.1609/aaai.v35i13.17378 |
| ids.doi | https://doi.org/10.1609/aaai.v35i13.17378 |
| ids.mag | 3131546278 |
| ids.openalex | https://openalex.org/W3131546278 |
| fwci | 0.61309853 |
| type | preprint |
| title | How RL Agents Behave When Their Actions Are Modified |
| biblio.issue | 13 |
| biblio.volume | 35 |
| biblio.last_page | 11594 |
| biblio.first_page | 11586 |
| topics[0].id | https://openalex.org/T10462 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9984999895095825 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Reinforcement Learning in Robotics |
| topics[1].id | https://openalex.org/T12761 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9767000079154968 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Data Stream Mining Techniques |
| topics[2].id | https://openalex.org/T10260 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9562000036239624 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1710 |
| topics[2].subfield.display_name | Information Systems |
| topics[2].display_name | Software Engineering Research |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C97541855 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8260191082954407 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q830687 |
| concepts[0].display_name | Reinforcement learning |
| concepts[1].id | https://openalex.org/C2780791683 |
| concepts[1].level | 2 |
| concepts[1].score | 0.7434389591217041 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q846785 |
| concepts[1].display_name | Action (physics) |
| concepts[2].id | https://openalex.org/C2779110517 |
| concepts[2].level | 2 |
| concepts[2].score | 0.7362669110298157 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q1240788 |
| concepts[2].display_name | Supervisor |
| concepts[3].id | https://openalex.org/C106189395 |
| concepts[3].level | 3 |
| concepts[3].score | 0.7092247009277344 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q176789 |
| concepts[3].display_name | Markov decision process |
| concepts[4].id | https://openalex.org/C41008148 |
| concepts[4].level | 0 |
| concepts[4].score | 0.6502410769462585 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[4].display_name | Computer science |
| concepts[5].id | https://openalex.org/C98045186 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5426817536354065 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q205663 |
| concepts[5].display_name | Process (computing) |
| concepts[6].id | https://openalex.org/C188116033 |
| concepts[6].level | 3 |
| concepts[6].score | 0.4653877317905426 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q2664563 |
| concepts[6].display_name | Q-learning |
| concepts[7].id | https://openalex.org/C67203356 |
| concepts[7].level | 2 |
| concepts[7].score | 0.46065402030944824 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q1321905 |
| concepts[7].display_name | Reinforcement |
| concepts[8].id | https://openalex.org/C2775924081 |
| concepts[8].level | 2 |
| concepts[8].score | 0.45117324590682983 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q55608371 |
| concepts[8].display_name | Control (management) |
| concepts[9].id | https://openalex.org/C112930515 |
| concepts[9].level | 1 |
| concepts[9].score | 0.4325242340564728 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q4389547 |
| concepts[9].display_name | Risk analysis (engineering) |
| concepts[10].id | https://openalex.org/C2780665704 |
| concepts[10].level | 2 |
| concepts[10].score | 0.4162677526473999 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q959298 |
| concepts[10].display_name | Intervention (counseling) |
| concepts[11].id | https://openalex.org/C154945302 |
| concepts[11].level | 1 |
| concepts[11].score | 0.4034082591533661 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[11].display_name | Artificial intelligence |
| concepts[12].id | https://openalex.org/C159886148 |
| concepts[12].level | 2 |
| concepts[12].score | 0.35152846574783325 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q176645 |
| concepts[12].display_name | Markov process |
| concepts[13].id | https://openalex.org/C15744967 |
| concepts[13].level | 0 |
| concepts[13].score | 0.1952427625656128 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q9418 |
| concepts[13].display_name | Psychology |
| concepts[14].id | https://openalex.org/C144133560 |
| concepts[14].level | 0 |
| concepts[14].score | 0.15594175457954407 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q4830453 |
| concepts[14].display_name | Business |
| concepts[15].id | https://openalex.org/C77805123 |
| concepts[15].level | 1 |
| concepts[15].score | 0.14220023155212402 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q161272 |
| concepts[15].display_name | Social psychology |
| concepts[16].id | https://openalex.org/C33923547 |
| concepts[16].level | 0 |
| concepts[16].score | 0.10868614912033081 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[16].display_name | Mathematics |
| concepts[17].id | https://openalex.org/C17744445 |
| concepts[17].level | 0 |
| concepts[17].score | 0.06879189610481262 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q36442 |
| concepts[17].display_name | Political science |
| concepts[18].id | https://openalex.org/C199539241 |
| concepts[18].level | 1 |
| concepts[18].score | 0.060529232025146484 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q7748 |
| concepts[18].display_name | Law |
| concepts[19].id | https://openalex.org/C118552586 |
| concepts[19].level | 1 |
| concepts[19].score | 0.0 |
| concepts[19].wikidata | https://www.wikidata.org/wiki/Q7867 |
| concepts[19].display_name | Psychiatry |
| concepts[20].id | https://openalex.org/C62520636 |
| concepts[20].level | 1 |
| concepts[20].score | 0.0 |
| concepts[20].wikidata | https://www.wikidata.org/wiki/Q944 |
| concepts[20].display_name | Quantum mechanics |
| concepts[21].id | https://openalex.org/C121332964 |
| concepts[21].level | 0 |
| concepts[21].score | 0.0 |
| concepts[21].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[21].display_name | Physics |
| concepts[22].id | https://openalex.org/C105795698 |
| concepts[22].level | 1 |
| concepts[22].score | 0.0 |
| concepts[22].wikidata | https://www.wikidata.org/wiki/Q12483 |
| concepts[22].display_name | Statistics |
| concepts[23].id | https://openalex.org/C111919701 |
| concepts[23].level | 1 |
| concepts[23].score | 0.0 |
| concepts[23].wikidata | https://www.wikidata.org/wiki/Q9135 |
| concepts[23].display_name | Operating system |
| keywords[0].id | https://openalex.org/keywords/reinforcement-learning |
| keywords[0].score | 0.8260191082954407 |
| keywords[0].display_name | Reinforcement learning |
| keywords[1].id | https://openalex.org/keywords/action |
| keywords[1].score | 0.7434389591217041 |
| keywords[1].display_name | Action (physics) |
| keywords[2].id | https://openalex.org/keywords/supervisor |
| keywords[2].score | 0.7362669110298157 |
| keywords[2].display_name | Supervisor |
| keywords[3].id | https://openalex.org/keywords/markov-decision-process |
| keywords[3].score | 0.7092247009277344 |
| keywords[3].display_name | Markov decision process |
| keywords[4].id | https://openalex.org/keywords/computer-science |
| keywords[4].score | 0.6502410769462585 |
| keywords[4].display_name | Computer science |
| keywords[5].id | https://openalex.org/keywords/process |
| keywords[5].score | 0.5426817536354065 |
| keywords[5].display_name | Process (computing) |
| keywords[6].id | https://openalex.org/keywords/q-learning |
| keywords[6].score | 0.4653877317905426 |
| keywords[6].display_name | Q-learning |
| keywords[7].id | https://openalex.org/keywords/reinforcement |
| keywords[7].score | 0.46065402030944824 |
| keywords[7].display_name | Reinforcement |
| keywords[8].id | https://openalex.org/keywords/control |
| keywords[8].score | 0.45117324590682983 |
| keywords[8].display_name | Control (management) |
| keywords[9].id | https://openalex.org/keywords/risk-analysis |
| keywords[9].score | 0.4325242340564728 |
| keywords[9].display_name | Risk analysis (engineering) |
| keywords[10].id | https://openalex.org/keywords/intervention |
| keywords[10].score | 0.4162677526473999 |
| keywords[10].display_name | Intervention (counseling) |
| keywords[11].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[11].score | 0.4034082591533661 |
| keywords[11].display_name | Artificial intelligence |
| keywords[12].id | https://openalex.org/keywords/markov-process |
| keywords[12].score | 0.35152846574783325 |
| keywords[12].display_name | Markov process |
| keywords[13].id | https://openalex.org/keywords/psychology |
| keywords[13].score | 0.1952427625656128 |
| keywords[13].display_name | Psychology |
| keywords[14].id | https://openalex.org/keywords/business |
| keywords[14].score | 0.15594175457954407 |
| keywords[14].display_name | Business |
| keywords[15].id | https://openalex.org/keywords/social-psychology |
| keywords[15].score | 0.14220023155212402 |
| keywords[15].display_name | Social psychology |
| keywords[16].id | https://openalex.org/keywords/mathematics |
| keywords[16].score | 0.10868614912033081 |
| keywords[16].display_name | Mathematics |
| keywords[17].id | https://openalex.org/keywords/political-science |
| keywords[17].score | 0.06879189610481262 |
| keywords[17].display_name | Political science |
| keywords[18].id | https://openalex.org/keywords/law |
| keywords[18].score | 0.060529232025146484 |
| keywords[18].display_name | Law |
| language | en |
| locations[0].id | doi:10.1609/aaai.v35i13.17378 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4210191458 |
| locations[0].source.issn | 2159-5399, 2374-3468 |
| locations[0].source.type | conference |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | 2159-5399 |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| locations[0].source.host_organization | https://openalex.org/P4310320058 |
| locations[0].source.host_organization_name | Association for the Advancement of Artificial Intelligence |
| locations[0].source.host_organization_lineage | https://openalex.org/P4310320058 |
| locations[0].source.host_organization_lineage_names | Association for the Advancement of Artificial Intelligence |
| locations[0].license | |
| locations[0].pdf_url | https://ojs.aaai.org/index.php/AAAI/article/download/17378/17185 |
| locations[0].version | publishedVersion |
| locations[0].raw_type | journal-article |
| locations[0].license_id | |
| locations[0].is_accepted | True |
| locations[0].is_published | True |
| locations[0].raw_source_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| locations[0].landing_page_url | https://doi.org/10.1609/aaai.v35i13.17378 |
| locations[1].id | pmh:oai:arXiv.org:2102.07716 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | https://arxiv.org/pdf/2102.07716 |
| locations[1].version | submittedVersion |
| locations[1].raw_type | text |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | False |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | http://arxiv.org/abs/2102.07716 |
| locations[2].id | mag:3131546278 |
| locations[2].is_oa | True |
| locations[2].source.id | https://openalex.org/S4306400194 |
| locations[2].source.issn | |
| locations[2].source.type | repository |
| locations[2].source.is_oa | True |
| locations[2].source.issn_l | |
| locations[2].source.is_core | False |
| locations[2].source.is_in_doaj | False |
| locations[2].source.display_name | arXiv (Cornell University) |
| locations[2].source.host_organization | https://openalex.org/I205783295 |
| locations[2].source.host_organization_name | Cornell University |
| locations[2].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[2].license | |
| locations[2].pdf_url | |
| locations[2].version | submittedVersion |
| locations[2].raw_type | |
| locations[2].license_id | |
| locations[2].is_accepted | False |
| locations[2].is_published | False |
| locations[2].raw_source_name | arXiv (Cornell University) |
| locations[2].landing_page_url | http://export.arxiv.org/pdf/2102.07716 |
| locations[3].id | doi:10.48550/arxiv.2102.07716 |
| locations[3].is_oa | True |
| locations[3].source.id | https://openalex.org/S4306400194 |
| locations[3].source.issn | |
| locations[3].source.type | repository |
| locations[3].source.is_oa | True |
| locations[3].source.issn_l | |
| locations[3].source.is_core | False |
| locations[3].source.is_in_doaj | False |
| locations[3].source.display_name | arXiv (Cornell University) |
| locations[3].source.host_organization | https://openalex.org/I205783295 |
| locations[3].source.host_organization_name | Cornell University |
| locations[3].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[3].license | |
| locations[3].pdf_url | |
| locations[3].version | |
| locations[3].raw_type | article-journal |
| locations[3].license_id | |
| locations[3].is_accepted | False |
| locations[3].is_published | |
| locations[3].raw_source_name | |
| locations[3].landing_page_url | https://doi.org/10.48550/arxiv.2102.07716 |
| indexed_in | arxiv, crossref, datacite |
| authorships[0].author.id | https://openalex.org/A5065996162 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Eric Langlois |
| authorships[0].affiliations[0].raw_affiliation_string | 1,2, and 3 |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Eric D. Langlois |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | 1,2, and 3 |
| authorships[1].author.id | https://openalex.org/A5020224050 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-1210-9866 |
| authorships[1].author.display_name | Tom Everitt |
| authorships[1].countries | GB |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I4210090411 |
| authorships[1].affiliations[0].raw_affiliation_string | DeepMind |
| authorships[1].institutions[0].id | https://openalex.org/I4210090411 |
| authorships[1].institutions[0].ror | https://ror.org/00971b260 |
| authorships[1].institutions[0].type | company |
| authorships[1].institutions[0].lineage | https://openalex.org/I4210090411, https://openalex.org/I4210128969 |
| authorships[1].institutions[0].country_code | GB |
| authorships[1].institutions[0].display_name | DeepMind (United Kingdom) |
| authorships[1].author_position | last |
| authorships[1].raw_author_name | Tom Everitt |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | DeepMind |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://ojs.aaai.org/index.php/AAAI/article/download/17378/17185 |
| open_access.oa_status | diamond |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | How RL Agents Behave When Their Actions Are Modified |
| has_fulltext | True |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T10462 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9984999895095825 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Reinforcement Learning in Robotics |
| related_works | https://openalex.org/W2979363950, https://openalex.org/W3123819124, https://openalex.org/W2989943922, https://openalex.org/W161119602, https://openalex.org/W1770190901, https://openalex.org/W2728003485, https://openalex.org/W2471081794, https://openalex.org/W3098428275, https://openalex.org/W2402236924, https://openalex.org/W2394924947, https://openalex.org/W2964251366, https://openalex.org/W2940957092, https://openalex.org/W1562694074, https://openalex.org/W2182124052, https://openalex.org/W3201003870, https://openalex.org/W2973029245, https://openalex.org/W2466720186, https://openalex.org/W2973186106, https://openalex.org/W3037998275, https://openalex.org/W3100097138 |
| cited_by_count | 5 |
| counts_by_year[0].year | 2022 |
| counts_by_year[0].cited_by_count | 1 |
| counts_by_year[1].year | 2021 |
| counts_by_year[1].cited_by_count | 4 |
| locations_count | 4 |
| best_oa_location.id | doi:10.1609/aaai.v35i13.17378 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4210191458 |
| best_oa_location.source.issn | 2159-5399, 2374-3468 |
| best_oa_location.source.type | conference |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | 2159-5399 |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| best_oa_location.source.host_organization | https://openalex.org/P4310320058 |
| best_oa_location.source.host_organization_name | Association for the Advancement of Artificial Intelligence |
| best_oa_location.source.host_organization_lineage | https://openalex.org/P4310320058 |
| best_oa_location.source.host_organization_lineage_names | Association for the Advancement of Artificial Intelligence |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://ojs.aaai.org/index.php/AAAI/article/download/17378/17185 |
| best_oa_location.version | publishedVersion |
| best_oa_location.raw_type | journal-article |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | True |
| best_oa_location.raw_source_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| best_oa_location.landing_page_url | https://doi.org/10.1609/aaai.v35i13.17378 |
| primary_location.id | doi:10.1609/aaai.v35i13.17378 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4210191458 |
| primary_location.source.issn | 2159-5399, 2374-3468 |
| primary_location.source.type | conference |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | 2159-5399 |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| primary_location.source.host_organization | https://openalex.org/P4310320058 |
| primary_location.source.host_organization_name | Association for the Advancement of Artificial Intelligence |
| primary_location.source.host_organization_lineage | https://openalex.org/P4310320058 |
| primary_location.source.host_organization_lineage_names | Association for the Advancement of Artificial Intelligence |
| primary_location.license | |
| primary_location.pdf_url | https://ojs.aaai.org/index.php/AAAI/article/download/17378/17185 |
| primary_location.version | publishedVersion |
| primary_location.raw_type | journal-article |
| primary_location.license_id | |
| primary_location.is_accepted | True |
| primary_location.is_published | True |
| primary_location.raw_source_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| primary_location.landing_page_url | https://doi.org/10.1609/aaai.v35i13.17378 |
| publication_date | 2021-05-18 |
| publication_year | 2021 |
| referenced_works | https://openalex.org/W6732033900, https://openalex.org/W6704298589, https://openalex.org/W2618318883, https://openalex.org/W6621199667, https://openalex.org/W6747790125, https://openalex.org/W2575705757, https://openalex.org/W1964488926, https://openalex.org/W2028145673, https://openalex.org/W6758622208, https://openalex.org/W2165131254, https://openalex.org/W6768463214, https://openalex.org/W6746721349, https://openalex.org/W2286365479, https://openalex.org/W1914583973, https://openalex.org/W6634711830, https://openalex.org/W6732559233, https://openalex.org/W7075292137, https://openalex.org/W2020609518, https://openalex.org/W6792155000, https://openalex.org/W2596367596, https://openalex.org/W2736629007, https://openalex.org/W2124175081, https://openalex.org/W6677916085, https://openalex.org/W6695925467, https://openalex.org/W2964273112, https://openalex.org/W2917742641, https://openalex.org/W2977925801, https://openalex.org/W1581742186, https://openalex.org/W3139377883, https://openalex.org/W648152870, https://openalex.org/W2143891888, https://openalex.org/W2121863487, https://openalex.org/W2784465508, https://openalex.org/W2574075983, https://openalex.org/W1557517019, https://openalex.org/W2913758949, https://openalex.org/W2150339816, https://openalex.org/W2768908787 |
| referenced_works_count | 38 |
| abstract_inverted_index.a | 17 |
| abstract_inverted_index.As | 16 |
| abstract_inverted_index.By | 100 |
| abstract_inverted_index.We | 39, 60 |
| abstract_inverted_index.an | 46 |
| abstract_inverted_index.by | 31 |
| abstract_inverted_index.go | 87 |
| abstract_inverted_index.in | 2, 70, 78, 91 |
| abstract_inverted_index.of | 19, 48, 65, 125 |
| abstract_inverted_index.or | 115 |
| abstract_inverted_index.to | 8, 55, 88, 93, 112, 122 |
| abstract_inverted_index.How | 34 |
| abstract_inverted_index.MDP | 50 |
| abstract_inverted_index.and | 73, 117 |
| abstract_inverted_index.can | 106 |
| abstract_inverted_index.may | 5, 25 |
| abstract_inverted_index.the | 10, 22, 28, 32, 41, 49, 58, 62, 102 |
| abstract_inverted_index.does | 35 |
| abstract_inverted_index.from | 12, 27, 57, 110 |
| abstract_inverted_index.like | 128 |
| abstract_inverted_index.show | 74 |
| abstract_inverted_index.some | 81 |
| abstract_inverted_index.that | 52, 75, 97 |
| abstract_inverted_index.they | 76 |
| abstract_inverted_index.this | 36, 71 |
| abstract_inverted_index.adapt | 77 |
| abstract_inverted_index.agent | 11, 120 |
| abstract_inverted_index.avoid | 94 |
| abstract_inverted_index.kinds | 124 |
| abstract_inverted_index.model | 51 |
| abstract_inverted_index.other | 123 |
| abstract_inverted_index.right | 103 |
| abstract_inverted_index.their | 108 |
| abstract_inverted_index.ways: | 80 |
| abstract_inverted_index.while | 85 |
| abstract_inverted_index.Markov | 43 |
| abstract_inverted_index.action | 24, 29, 95, 126 |
| abstract_inverted_index.affect | 37 |
| abstract_inverted_index.agents | 109 |
| abstract_inverted_index.allows | 53 |
| abstract_inverted_index.better | 118 |
| abstract_inverted_index.common | 66 |
| abstract_inverted_index.differ | 26, 56 |
| abstract_inverted_index.ignore | 83 |
| abstract_inverted_index.others | 86 |
| abstract_inverted_index.result | 18 |
| abstract_inverted_index.trying | 92 |
| abstract_inverted_index.actions | 54 |
| abstract_inverted_index.analyze | 61 |
| abstract_inverted_index.complex | 3 |
| abstract_inverted_index.control | 119 |
| abstract_inverted_index.lengths | 90 |
| abstract_inverted_index.policy. | 33, 59 |
| abstract_inverted_index.present | 40 |
| abstract_inverted_index.prevent | 9, 107 |
| abstract_inverted_index.require | 6 |
| abstract_inverted_index.reward. | 99 |
| abstract_inverted_index.setting | 72 |
| abstract_inverted_index.various | 89 |
| abstract_inverted_index.Decision | 44 |
| abstract_inverted_index.Process, | 45 |
| abstract_inverted_index.actions. | 15 |
| abstract_inverted_index.choosing | 101 |
| abstract_inverted_index.decrease | 98 |
| abstract_inverted_index.executed | 23 |
| abstract_inverted_index.learning | 1, 68, 111 |
| abstract_inverted_index.dangerous | 14 |
| abstract_inverted_index.different | 79 |
| abstract_inverted_index.extension | 47 |
| abstract_inverted_index.learning? | 38 |
| abstract_inverted_index.responses | 121 |
| abstract_inverted_index.specified | 30 |
| abstract_inverted_index.algorithm, | 104 |
| abstract_inverted_index.algorithms | 69 |
| abstract_inverted_index.asymptotic | 63 |
| abstract_inverted_index.attempting | 13 |
| abstract_inverted_index.behaviours | 64 |
| abstract_inverted_index.circumvent | 113 |
| abstract_inverted_index.completely | 82 |
| abstract_inverted_index.developers | 105 |
| abstract_inverted_index.supervisor | 20 |
| abstract_inverted_index.supervision | 7 |
| abstract_inverted_index.constraints, | 116 |
| abstract_inverted_index.environments | 4 |
| abstract_inverted_index.self-damage. | 129 |
| abstract_inverted_index.Reinforcement | 0 |
| abstract_inverted_index.interruptions | 114 |
| abstract_inverted_index.intervention, | 21 |
| abstract_inverted_index.modification, | 127 |
| abstract_inverted_index.modifications | 84, 96 |
| abstract_inverted_index.reinforcement | 67 |
| abstract_inverted_index.Modified-Action | 42 |
| cited_by_percentile_year.max | 97 |
| cited_by_percentile_year.min | 89 |
| countries_distinct_count | 1 |
| institutions_distinct_count | 2 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/16 |
| sustainable_development_goals[0].score | 0.8399999737739563 |
| sustainable_development_goals[0].display_name | Peace, Justice and strong institutions |
| citation_normalized_percentile.value | 0.69086498 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |