How RL Agents Behave When Their Actions Are Modified Article Swipe

PDF

YOU? · · 2021 · Open Access · · DOI: https://doi.org/10.1609/aaai.v35i13.17378

Reinforcement learning in complex environments may require supervision to prevent the agent from attempting dangerous actions. As a result of supervisor intervention, the executed action may differ from the action specified by the policy. How does this affect learning? We present the Modified-Action Markov Decision Process, an extension of the MDP model that allows actions to differ from the policy. We analyze the asymptotic behaviours of common reinforcement learning algorithms in this setting and show that they adapt in different ways: some completely ignore modifications while others go to various lengths in trying to avoid action modifications that decrease reward. By choosing the right algorithm, developers can prevent their agents from learning to circumvent interruptions or constraints, and better control agent responses to other kinds of action modification, like self-damage.

Related Topics

Reinforcement Learning

Artificial Intelligence

Concepts

Reinforcement learning Action (physics) Supervisor Markov decision process Computer science Process (computing) Q-learning Reinforcement Control (management) Risk analysis (engineering) Intervention (counseling) Artificial intelligence Markov process Psychology Business Social psychology Mathematics Political science Law Psychiatry Quantum mechanics Physics Statistics Operating system

Metadata

Type: preprint
Language: en
Landing Page: https://doi.org/10.1609/aaai.v35i13.17378
PDF: https://ojs.aaai.org/index.php/AAAI/article/download/17378/17185
OA Status: diamond
Cited By: 5
References: 38
Related Works: 20
OpenAlex ID: https://openalex.org/W3131546278

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W3131546278

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.1609/aaai.v35i13.17378

Digital Object Identifier
Title: How RL Agents Behave When Their Actions Are Modified

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2021

Year of publication
Publication date: 2021-05-18

Full publication date if available
Authors: Eric Langlois, Tom Everitt

List of authors in order
Landing page: https://doi.org/10.1609/aaai.v35i13.17378

Publisher landing page
PDF URL: https://ojs.aaai.org/index.php/AAAI/article/download/17378/17185

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: diamond

Open access status per OpenAlex
OA URL: https://ojs.aaai.org/index.php/AAAI/article/download/17378/17185

Direct OA link when available
Concepts: Reinforcement learning, Action (physics), Supervisor, Markov decision process, Computer science, Process (computing), Q-learning, Reinforcement, Control (management), Risk analysis (engineering), Intervention (counseling), Artificial intelligence, Markov process, Psychology, Business, Social psychology, Mathematics, Political science, Law, Psychiatry, Quantum mechanics, Physics, Statistics, Operating system

Top concepts (fields/topics) attached by OpenAlex
Cited by: 5

Total citation count in OpenAlex
Citations by year (recent): 2022: 1, 2021: 4

Per-year citation counts (last 5 years)
References (count): 38

Number of works referenced by this work
Related works (count): 20

Other works algorithmically related by OpenAlex

Full payload

id	https://openalex.org/W3131546278
doi	https://doi.org/10.1609/aaai.v35i13.17378
ids.doi	https://doi.org/10.1609/aaai.v35i13.17378
ids.mag	3131546278
ids.openalex	https://openalex.org/W3131546278
fwci	0.61309853
type	preprint
title	How RL Agents Behave When Their Actions Are Modified
biblio.issue	13
biblio.volume	35
biblio.last_page	11594
biblio.first_page	11586
topics[0].id	https://openalex.org/T10462
topics[0].field.id	https://openalex.org/fields/17
topics[0].field.display_name	Computer Science
topics[0].score	0.9984999895095825
topics[0].domain.id	https://openalex.org/domains/3
topics[0].domain.display_name	Physical Sciences
topics[0].subfield.id	https://openalex.org/subfields/1702
topics[0].subfield.display_name	Artificial Intelligence
topics[0].display_name	Reinforcement Learning in Robotics
topics[1].id	https://openalex.org/T12761
topics[1].field.id	https://openalex.org/fields/17
topics[1].field.display_name	Computer Science
topics[1].score	0.9767000079154968
topics[1].domain.id	https://openalex.org/domains/3
topics[1].domain.display_name	Physical Sciences
topics[1].subfield.id	https://openalex.org/subfields/1702
topics[1].subfield.display_name	Artificial Intelligence
topics[1].display_name	Data Stream Mining Techniques
topics[2].id	https://openalex.org/T10260
topics[2].field.id	https://openalex.org/fields/17
topics[2].field.display_name	Computer Science
topics[2].score	0.9562000036239624
topics[2].domain.id	https://openalex.org/domains/3
topics[2].domain.display_name	Physical Sciences
topics[2].subfield.id	https://openalex.org/subfields/1710
topics[2].subfield.display_name	Information Systems
topics[2].display_name	Software Engineering Research
is_xpac	False
apc_list
apc_paid
concepts[0].id	https://openalex.org/C97541855
concepts[0].level	2
concepts[0].score	0.8260191082954407
concepts[0].wikidata	https://www.wikidata.org/wiki/Q830687
concepts[0].display_name	Reinforcement learning
concepts[1].id	https://openalex.org/C2780791683
concepts[1].level	2
concepts[1].score	0.7434389591217041
concepts[1].wikidata	https://www.wikidata.org/wiki/Q846785
concepts[1].display_name	Action (physics)
concepts[2].id	https://openalex.org/C2779110517
concepts[2].level	2
concepts[2].score	0.7362669110298157
concepts[2].wikidata	https://www.wikidata.org/wiki/Q1240788
concepts[2].display_name	Supervisor
concepts[3].id	https://openalex.org/C106189395
concepts[3].level	3
concepts[3].score	0.7092247009277344
concepts[3].wikidata	https://www.wikidata.org/wiki/Q176789
concepts[3].display_name	Markov decision process
concepts[4].id	https://openalex.org/C41008148
concepts[4].level	0
concepts[4].score	0.6502410769462585
concepts[4].wikidata	https://www.wikidata.org/wiki/Q21198
concepts[4].display_name	Computer science
concepts[5].id	https://openalex.org/C98045186
concepts[5].level	2
concepts[5].score	0.5426817536354065
concepts[5].wikidata	https://www.wikidata.org/wiki/Q205663
concepts[5].display_name	Process (computing)
concepts[6].id	https://openalex.org/C188116033
concepts[6].level	3
concepts[6].score	0.4653877317905426
concepts[6].wikidata	https://www.wikidata.org/wiki/Q2664563
concepts[6].display_name	Q-learning
concepts[7].id	https://openalex.org/C67203356
concepts[7].level	2
concepts[7].score	0.46065402030944824
concepts[7].wikidata	https://www.wikidata.org/wiki/Q1321905
concepts[7].display_name	Reinforcement
concepts[8].id	https://openalex.org/C2775924081
concepts[8].level	2
concepts[8].score	0.45117324590682983
concepts[8].wikidata	https://www.wikidata.org/wiki/Q55608371
concepts[8].display_name	Control (management)
concepts[9].id	https://openalex.org/C112930515
concepts[9].level	1
concepts[9].score	0.4325242340564728
concepts[9].wikidata	https://www.wikidata.org/wiki/Q4389547
concepts[9].display_name	Risk analysis (engineering)
concepts[10].id	https://openalex.org/C2780665704
concepts[10].level	2
concepts[10].score	0.4162677526473999
concepts[10].wikidata	https://www.wikidata.org/wiki/Q959298
concepts[10].display_name	Intervention (counseling)
concepts[11].id	https://openalex.org/C154945302
concepts[11].level	1
concepts[11].score	0.4034082591533661
concepts[11].wikidata	https://www.wikidata.org/wiki/Q11660
concepts[11].display_name	Artificial intelligence
concepts[12].id	https://openalex.org/C159886148
concepts[12].level	2
concepts[12].score	0.35152846574783325
concepts[12].wikidata	https://www.wikidata.org/wiki/Q176645
concepts[12].display_name	Markov process
concepts[13].id	https://openalex.org/C15744967
concepts[13].level	0
concepts[13].score	0.1952427625656128
concepts[13].wikidata	https://www.wikidata.org/wiki/Q9418
concepts[13].display_name	Psychology
concepts[14].id	https://openalex.org/C144133560
concepts[14].level	0
concepts[14].score	0.15594175457954407
concepts[14].wikidata	https://www.wikidata.org/wiki/Q4830453
concepts[14].display_name	Business
concepts[15].id	https://openalex.org/C77805123
concepts[15].level	1
concepts[15].score	0.14220023155212402
concepts[15].wikidata	https://www.wikidata.org/wiki/Q161272
concepts[15].display_name	Social psychology
concepts[16].id	https://openalex.org/C33923547
concepts[16].level	0
concepts[16].score	0.10868614912033081
concepts[16].wikidata	https://www.wikidata.org/wiki/Q395
concepts[16].display_name	Mathematics
concepts[17].id	https://openalex.org/C17744445
concepts[17].level	0
concepts[17].score	0.06879189610481262
concepts[17].wikidata	https://www.wikidata.org/wiki/Q36442
concepts[17].display_name	Political science
concepts[18].id	https://openalex.org/C199539241
concepts[18].level	1
concepts[18].score	0.060529232025146484
concepts[18].wikidata	https://www.wikidata.org/wiki/Q7748
concepts[18].display_name	Law
concepts[19].id	https://openalex.org/C118552586
concepts[19].level	1
concepts[19].score	0.0
concepts[19].wikidata	https://www.wikidata.org/wiki/Q7867
concepts[19].display_name	Psychiatry
concepts[20].id	https://openalex.org/C62520636
concepts[20].level	1
concepts[20].score	0.0
concepts[20].wikidata	https://www.wikidata.org/wiki/Q944
concepts[20].display_name	Quantum mechanics
concepts[21].id	https://openalex.org/C121332964
concepts[21].level	0
concepts[21].score	0.0
concepts[21].wikidata	https://www.wikidata.org/wiki/Q413
concepts[21].display_name	Physics
concepts[22].id	https://openalex.org/C105795698
concepts[22].level	1
concepts[22].score	0.0
concepts[22].wikidata	https://www.wikidata.org/wiki/Q12483
concepts[22].display_name	Statistics
concepts[23].id	https://openalex.org/C111919701
concepts[23].level	1
concepts[23].score	0.0
concepts[23].wikidata	https://www.wikidata.org/wiki/Q9135
concepts[23].display_name	Operating system
keywords[0].id	https://openalex.org/keywords/reinforcement-learning
keywords[0].score	0.8260191082954407
keywords[0].display_name	Reinforcement learning
keywords[1].id	https://openalex.org/keywords/action
keywords[1].score	0.7434389591217041
keywords[1].display_name	Action (physics)
keywords[2].id	https://openalex.org/keywords/supervisor
keywords[2].score	0.7362669110298157
keywords[2].display_name	Supervisor
keywords[3].id	https://openalex.org/keywords/markov-decision-process
keywords[3].score	0.7092247009277344
keywords[3].display_name	Markov decision process
keywords[4].id	https://openalex.org/keywords/computer-science
keywords[4].score	0.6502410769462585
keywords[4].display_name	Computer science
keywords[5].id	https://openalex.org/keywords/process
keywords[5].score	0.5426817536354065
keywords[5].display_name	Process (computing)
keywords[6].id	https://openalex.org/keywords/q-learning
keywords[6].score	0.4653877317905426
keywords[6].display_name	Q-learning
keywords[7].id	https://openalex.org/keywords/reinforcement
keywords[7].score	0.46065402030944824
keywords[7].display_name	Reinforcement
keywords[8].id	https://openalex.org/keywords/control
keywords[8].score	0.45117324590682983
keywords[8].display_name	Control (management)
keywords[9].id	https://openalex.org/keywords/risk-analysis
keywords[9].score	0.4325242340564728
keywords[9].display_name	Risk analysis (engineering)
keywords[10].id	https://openalex.org/keywords/intervention
keywords[10].score	0.4162677526473999
keywords[10].display_name	Intervention (counseling)
keywords[11].id	https://openalex.org/keywords/artificial-intelligence
keywords[11].score	0.4034082591533661
keywords[11].display_name	Artificial intelligence
keywords[12].id	https://openalex.org/keywords/markov-process
keywords[12].score	0.35152846574783325
keywords[12].display_name	Markov process
keywords[13].id	https://openalex.org/keywords/psychology
keywords[13].score	0.1952427625656128
keywords[13].display_name	Psychology
keywords[14].id	https://openalex.org/keywords/business
keywords[14].score	0.15594175457954407
keywords[14].display_name	Business
keywords[15].id	https://openalex.org/keywords/social-psychology
keywords[15].score	0.14220023155212402
keywords[15].display_name	Social psychology
keywords[16].id	https://openalex.org/keywords/mathematics
keywords[16].score	0.10868614912033081
keywords[16].display_name	Mathematics
keywords[17].id	https://openalex.org/keywords/political-science
keywords[17].score	0.06879189610481262
keywords[17].display_name	Political science
keywords[18].id	https://openalex.org/keywords/law
keywords[18].score	0.060529232025146484
keywords[18].display_name	Law
language	en
locations[0].id	doi:10.1609/aaai.v35i13.17378
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4210191458
locations[0].source.issn	2159-5399, 2374-3468
locations[0].source.type	conference
locations[0].source.is_oa	True
locations[0].source.issn_l	2159-5399
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	Proceedings of the AAAI Conference on Artificial Intelligence
locations[0].source.host_organization	https://openalex.org/P4310320058
locations[0].source.host_organization_name	Association for the Advancement of Artificial Intelligence
locations[0].source.host_organization_lineage	https://openalex.org/P4310320058
locations[0].source.host_organization_lineage_names	Association for the Advancement of Artificial Intelligence
locations[0].license
locations[0].pdf_url	https://ojs.aaai.org/index.php/AAAI/article/download/17378/17185
locations[0].version	publishedVersion
locations[0].raw_type	journal-article
locations[0].license_id
locations[0].is_accepted	True
locations[0].is_published	True
locations[0].raw_source_name	Proceedings of the AAAI Conference on Artificial Intelligence
locations[0].landing_page_url	https://doi.org/10.1609/aaai.v35i13.17378
locations[1].id	pmh:oai:arXiv.org:2102.07716
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license
locations[1].pdf_url	https://arxiv.org/pdf/2102.07716
locations[1].version	submittedVersion
locations[1].raw_type	text
locations[1].license_id
locations[1].is_accepted	False
locations[1].is_published	False
locations[1].raw_source_name
locations[1].landing_page_url	http://arxiv.org/abs/2102.07716
locations[2].id	mag:3131546278
locations[2].is_oa	True
locations[2].source.id	https://openalex.org/S4306400194
locations[2].source.issn
locations[2].source.type	repository
locations[2].source.is_oa	True
locations[2].source.issn_l
locations[2].source.is_core	False
locations[2].source.is_in_doaj	False
locations[2].source.display_name	arXiv (Cornell University)
locations[2].source.host_organization	https://openalex.org/I205783295
locations[2].source.host_organization_name	Cornell University
locations[2].source.host_organization_lineage	https://openalex.org/I205783295
locations[2].license
locations[2].pdf_url
locations[2].version	submittedVersion
locations[2].raw_type
locations[2].license_id
locations[2].is_accepted	False
locations[2].is_published	False
locations[2].raw_source_name	arXiv (Cornell University)
locations[2].landing_page_url	http://export.arxiv.org/pdf/2102.07716
locations[3].id	doi:10.48550/arxiv.2102.07716
locations[3].is_oa	True
locations[3].source.id	https://openalex.org/S4306400194
locations[3].source.issn
locations[3].source.type	repository
locations[3].source.is_oa	True
locations[3].source.issn_l
locations[3].source.is_core	False
locations[3].source.is_in_doaj	False
locations[3].source.display_name	arXiv (Cornell University)
locations[3].source.host_organization	https://openalex.org/I205783295
locations[3].source.host_organization_name	Cornell University
locations[3].source.host_organization_lineage	https://openalex.org/I205783295
locations[3].license
locations[3].pdf_url
locations[3].version
locations[3].raw_type	article-journal
locations[3].license_id
locations[3].is_accepted	False
locations[3].is_published
locations[3].raw_source_name
locations[3].landing_page_url	https://doi.org/10.48550/arxiv.2102.07716
indexed_in	arxiv, crossref, datacite
authorships[0].author.id	https://openalex.org/A5065996162
authorships[0].author.orcid
authorships[0].author.display_name	Eric Langlois
authorships[0].affiliations[0].raw_affiliation_string	1,2, and 3
authorships[0].author_position	first
authorships[0].raw_author_name	Eric D. Langlois
authorships[0].is_corresponding	False
authorships[0].raw_affiliation_strings	1,2, and 3
authorships[1].author.id	https://openalex.org/A5020224050
authorships[1].author.orcid	https://orcid.org/0000-0003-1210-9866
authorships[1].author.display_name	Tom Everitt
authorships[1].countries	GB
authorships[1].affiliations[0].institution_ids	https://openalex.org/I4210090411
authorships[1].affiliations[0].raw_affiliation_string	DeepMind
authorships[1].institutions[0].id	https://openalex.org/I4210090411
authorships[1].institutions[0].ror	https://ror.org/00971b260
authorships[1].institutions[0].type	company
authorships[1].institutions[0].lineage	https://openalex.org/I4210090411, https://openalex.org/I4210128969
authorships[1].institutions[0].country_code	GB
authorships[1].institutions[0].display_name	DeepMind (United Kingdom)
authorships[1].author_position	last
authorships[1].raw_author_name	Tom Everitt
authorships[1].is_corresponding	False
authorships[1].raw_affiliation_strings	DeepMind
has_content.pdf	True
has_content.grobid_xml	True
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://ojs.aaai.org/index.php/AAAI/article/download/17378/17185
open_access.oa_status	diamond
open_access.any_repository_has_fulltext	False
created_date	2025-10-10T00:00:00
display_name	How RL Agents Behave When Their Actions Are Modified
has_fulltext	True
is_retracted	False
updated_date	2025-11-06T03:46:38.306776
primary_topic.id	https://openalex.org/T10462
primary_topic.field.id	https://openalex.org/fields/17
primary_topic.field.display_name	Computer Science
primary_topic.score	0.9984999895095825
primary_topic.domain.id	https://openalex.org/domains/3
primary_topic.domain.display_name	Physical Sciences
primary_topic.subfield.id	https://openalex.org/subfields/1702
primary_topic.subfield.display_name	Artificial Intelligence
primary_topic.display_name	Reinforcement Learning in Robotics
related_works	https://openalex.org/W2979363950, https://openalex.org/W3123819124, https://openalex.org/W2989943922, https://openalex.org/W161119602, https://openalex.org/W1770190901, https://openalex.org/W2728003485, https://openalex.org/W2471081794, https://openalex.org/W3098428275, https://openalex.org/W2402236924, https://openalex.org/W2394924947, https://openalex.org/W2964251366, https://openalex.org/W2940957092, https://openalex.org/W1562694074, https://openalex.org/W2182124052, https://openalex.org/W3201003870, https://openalex.org/W2973029245, https://openalex.org/W2466720186, https://openalex.org/W2973186106, https://openalex.org/W3037998275, https://openalex.org/W3100097138
cited_by_count	5
counts_by_year[0].year	2022
counts_by_year[0].cited_by_count	1
counts_by_year[1].year	2021
counts_by_year[1].cited_by_count	4
locations_count	4
best_oa_location.id	doi:10.1609/aaai.v35i13.17378
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4210191458
best_oa_location.source.issn	2159-5399, 2374-3468
best_oa_location.source.type	conference
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l	2159-5399
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	Proceedings of the AAAI Conference on Artificial Intelligence
best_oa_location.source.host_organization	https://openalex.org/P4310320058
best_oa_location.source.host_organization_name	Association for the Advancement of Artificial Intelligence
best_oa_location.source.host_organization_lineage	https://openalex.org/P4310320058
best_oa_location.source.host_organization_lineage_names	Association for the Advancement of Artificial Intelligence
best_oa_location.license
best_oa_location.pdf_url	https://ojs.aaai.org/index.php/AAAI/article/download/17378/17185
best_oa_location.version	publishedVersion
best_oa_location.raw_type	journal-article
best_oa_location.license_id
best_oa_location.is_accepted	True
best_oa_location.is_published	True
best_oa_location.raw_source_name	Proceedings of the AAAI Conference on Artificial Intelligence
best_oa_location.landing_page_url	https://doi.org/10.1609/aaai.v35i13.17378
primary_location.id	doi:10.1609/aaai.v35i13.17378
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4210191458
primary_location.source.issn	2159-5399, 2374-3468
primary_location.source.type	conference
primary_location.source.is_oa	True
primary_location.source.issn_l	2159-5399
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	Proceedings of the AAAI Conference on Artificial Intelligence
primary_location.source.host_organization	https://openalex.org/P4310320058
primary_location.source.host_organization_name	Association for the Advancement of Artificial Intelligence
primary_location.source.host_organization_lineage	https://openalex.org/P4310320058
primary_location.source.host_organization_lineage_names	Association for the Advancement of Artificial Intelligence
primary_location.license
primary_location.pdf_url	https://ojs.aaai.org/index.php/AAAI/article/download/17378/17185
primary_location.version	publishedVersion
primary_location.raw_type	journal-article
primary_location.license_id
primary_location.is_accepted	True
primary_location.is_published	True
primary_location.raw_source_name	Proceedings of the AAAI Conference on Artificial Intelligence
primary_location.landing_page_url	https://doi.org/10.1609/aaai.v35i13.17378
publication_date	2021-05-18
publication_year	2021
referenced_works	https://openalex.org/W6732033900, https://openalex.org/W6704298589, https://openalex.org/W2618318883, https://openalex.org/W6621199667, https://openalex.org/W6747790125, https://openalex.org/W2575705757, https://openalex.org/W1964488926, https://openalex.org/W2028145673, https://openalex.org/W6758622208, https://openalex.org/W2165131254, https://openalex.org/W6768463214, https://openalex.org/W6746721349, https://openalex.org/W2286365479, https://openalex.org/W1914583973, https://openalex.org/W6634711830, https://openalex.org/W6732559233, https://openalex.org/W7075292137, https://openalex.org/W2020609518, https://openalex.org/W6792155000, https://openalex.org/W2596367596, https://openalex.org/W2736629007, https://openalex.org/W2124175081, https://openalex.org/W6677916085, https://openalex.org/W6695925467, https://openalex.org/W2964273112, https://openalex.org/W2917742641, https://openalex.org/W2977925801, https://openalex.org/W1581742186, https://openalex.org/W3139377883, https://openalex.org/W648152870, https://openalex.org/W2143891888, https://openalex.org/W2121863487, https://openalex.org/W2784465508, https://openalex.org/W2574075983, https://openalex.org/W1557517019, https://openalex.org/W2913758949, https://openalex.org/W2150339816, https://openalex.org/W2768908787
referenced_works_count	38
abstract_inverted_index.a	17
abstract_inverted_index.As	16
abstract_inverted_index.By	100
abstract_inverted_index.We	39, 60
abstract_inverted_index.an	46
abstract_inverted_index.by	31
abstract_inverted_index.go	87
abstract_inverted_index.in	2, 70, 78, 91
abstract_inverted_index.of	19, 48, 65, 125
abstract_inverted_index.or	115
abstract_inverted_index.to	8, 55, 88, 93, 112, 122
abstract_inverted_index.How	34
abstract_inverted_index.MDP	50
abstract_inverted_index.and	73, 117
abstract_inverted_index.can	106
abstract_inverted_index.may	5, 25
abstract_inverted_index.the	10, 22, 28, 32, 41, 49, 58, 62, 102
abstract_inverted_index.does	35
abstract_inverted_index.from	12, 27, 57, 110
abstract_inverted_index.like	128
abstract_inverted_index.show	74
abstract_inverted_index.some	81
abstract_inverted_index.that	52, 75, 97
abstract_inverted_index.they	76
abstract_inverted_index.this	36, 71
abstract_inverted_index.adapt	77
abstract_inverted_index.agent	11, 120
abstract_inverted_index.avoid	94
abstract_inverted_index.kinds	124
abstract_inverted_index.model	51
abstract_inverted_index.other	123
abstract_inverted_index.right	103
abstract_inverted_index.their	108
abstract_inverted_index.ways:	80
abstract_inverted_index.while	85
abstract_inverted_index.Markov	43
abstract_inverted_index.action	24, 29, 95, 126
abstract_inverted_index.affect	37
abstract_inverted_index.agents	109
abstract_inverted_index.allows	53
abstract_inverted_index.better	118
abstract_inverted_index.common	66
abstract_inverted_index.differ	26, 56
abstract_inverted_index.ignore	83
abstract_inverted_index.others	86
abstract_inverted_index.result	18
abstract_inverted_index.trying	92
abstract_inverted_index.actions	54
abstract_inverted_index.analyze	61
abstract_inverted_index.complex	3
abstract_inverted_index.control	119
abstract_inverted_index.lengths	90
abstract_inverted_index.policy.	33, 59
abstract_inverted_index.present	40
abstract_inverted_index.prevent	9, 107
abstract_inverted_index.require	6
abstract_inverted_index.reward.	99
abstract_inverted_index.setting	72
abstract_inverted_index.various	89
abstract_inverted_index.Decision	44
abstract_inverted_index.Process,	45
abstract_inverted_index.actions.	15
abstract_inverted_index.choosing	101
abstract_inverted_index.decrease	98
abstract_inverted_index.executed	23
abstract_inverted_index.learning	1, 68, 111
abstract_inverted_index.dangerous	14
abstract_inverted_index.different	79
abstract_inverted_index.extension	47
abstract_inverted_index.learning?	38
abstract_inverted_index.responses	121
abstract_inverted_index.specified	30
abstract_inverted_index.algorithm,	104
abstract_inverted_index.algorithms	69
abstract_inverted_index.asymptotic	63
abstract_inverted_index.attempting	13
abstract_inverted_index.behaviours	64
abstract_inverted_index.circumvent	113
abstract_inverted_index.completely	82
abstract_inverted_index.developers	105
abstract_inverted_index.supervisor	20
abstract_inverted_index.supervision	7
abstract_inverted_index.constraints,	116
abstract_inverted_index.environments	4
abstract_inverted_index.self-damage.	129
abstract_inverted_index.Reinforcement	0
abstract_inverted_index.interruptions	114
abstract_inverted_index.intervention,	21
abstract_inverted_index.modification,	127
abstract_inverted_index.modifications	84, 96
abstract_inverted_index.reinforcement	67
abstract_inverted_index.Modified-Action	42
cited_by_percentile_year.max	97
cited_by_percentile_year.min	89
countries_distinct_count	1
institutions_distinct_count	2
sustainable_development_goals[0].id	https://metadata.un.org/sdg/16
sustainable_development_goals[0].score	0.8399999737739563
sustainable_development_goals[0].display_name	Peace, Justice and strong institutions
citation_normalized_percentile.value	0.69086498
citation_normalized_percentile.is_in_top_1_percent	False
citation_normalized_percentile.is_in_top_10_percent	False