Safe Reinforcement Learning Using Advantage-Based Intervention Article Swipe
YOU?
·
· 2021
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2106.09110
Many sequential decision problems involve finding a policy that maximizes total reward while obeying safety constraints. Although much recent research has focused on the development of safe reinforcement learning (RL) algorithms that produce a safe policy after training, ensuring safety during training as well remains an open problem. A fundamental challenge is performing exploration while still satisfying constraints in an unknown Markov decision process (MDP). In this work, we address this problem for the chance-constrained setting. We propose a new algorithm, SAILR, that uses an intervention mechanism based on advantage functions to keep the agent safe throughout training and optimizes the agent's policy using off-the-shelf RL algorithms designed for unconstrained MDPs. Our method comes with strong guarantees on safety during both training and deployment (i.e., after training and without the intervention mechanism) and policy performance compared to the optimal safety-constrained policy. In our experiments, we show that SAILR violates constraints far less during training than standard safe RL and constrained MDP approaches and converges to a well-performing policy that can be deployed safely without intervention. Our code is available at https://github.com/nolanwagener/safe_rl.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2106.09110
- https://arxiv.org/pdf/2106.09110
- OA Status
- green
- Cited By
- 5
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W3168575058
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W3168575058Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2106.09110Digital Object Identifier
- Title
-
Safe Reinforcement Learning Using Advantage-Based InterventionWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2021Year of publication
- Publication date
-
2021-06-16Full publication date if available
- Authors
-
Nolan Wagener, Byron Boots, Cheng ChangList of authors in order
- Landing page
-
https://arxiv.org/abs/2106.09110Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2106.09110Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2106.09110Direct OA link when available
- Concepts
-
Reinforcement learning, Markov decision process, Computer science, Intervention (counseling), Software deployment, Mathematical optimization, Process (computing), Training (meteorology), Risk analysis (engineering), Markov process, Operations research, Artificial intelligence, Engineering, Business, Mathematics, Operating system, Meteorology, Statistics, Physics, Psychology, PsychiatryTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
5Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 1, 2024: 1, 2023: 2, 2022: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W3168575058 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2106.09110 |
| ids.doi | https://doi.org/10.48550/arxiv.2106.09110 |
| ids.mag | 3168575058 |
| ids.openalex | https://openalex.org/W3168575058 |
| fwci | |
| type | preprint |
| title | Safe Reinforcement Learning Using Advantage-Based Intervention |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T13497 |
| topics[0].field.id | https://openalex.org/fields/12 |
| topics[0].field.display_name | Arts and Humanities |
| topics[0].score | 0.9879000186920166 |
| topics[0].domain.id | https://openalex.org/domains/2 |
| topics[0].domain.display_name | Social Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1211 |
| topics[0].subfield.display_name | Philosophy |
| topics[0].display_name | Hermeneutics and Narrative Identity |
| topics[1].id | https://openalex.org/T13695 |
| topics[1].field.id | https://openalex.org/fields/36 |
| topics[1].field.display_name | Health Professions |
| topics[1].score | 0.9749000072479248 |
| topics[1].domain.id | https://openalex.org/domains/4 |
| topics[1].domain.display_name | Health Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/3600 |
| topics[1].subfield.display_name | General Health Professions |
| topics[1].display_name | Aging, Elder Care, and Social Issues |
| topics[2].id | https://openalex.org/T13099 |
| topics[2].field.id | https://openalex.org/fields/36 |
| topics[2].field.display_name | Health Professions |
| topics[2].score | 0.95660001039505 |
| topics[2].domain.id | https://openalex.org/domains/4 |
| topics[2].domain.display_name | Health Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/3600 |
| topics[2].subfield.display_name | General Health Professions |
| topics[2].display_name | Health, Medicine and Society |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C97541855 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8874133825302124 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q830687 |
| concepts[0].display_name | Reinforcement learning |
| concepts[1].id | https://openalex.org/C106189395 |
| concepts[1].level | 3 |
| concepts[1].score | 0.8101300001144409 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q176789 |
| concepts[1].display_name | Markov decision process |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.7266027927398682 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C2780665704 |
| concepts[3].level | 2 |
| concepts[3].score | 0.599596381187439 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q959298 |
| concepts[3].display_name | Intervention (counseling) |
| concepts[4].id | https://openalex.org/C105339364 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5050044655799866 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q2297740 |
| concepts[4].display_name | Software deployment |
| concepts[5].id | https://openalex.org/C126255220 |
| concepts[5].level | 1 |
| concepts[5].score | 0.4588014483451843 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q141495 |
| concepts[5].display_name | Mathematical optimization |
| concepts[6].id | https://openalex.org/C98045186 |
| concepts[6].level | 2 |
| concepts[6].score | 0.4410219192504883 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q205663 |
| concepts[6].display_name | Process (computing) |
| concepts[7].id | https://openalex.org/C2777211547 |
| concepts[7].level | 2 |
| concepts[7].score | 0.42957448959350586 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q17141490 |
| concepts[7].display_name | Training (meteorology) |
| concepts[8].id | https://openalex.org/C112930515 |
| concepts[8].level | 1 |
| concepts[8].score | 0.41235288977622986 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q4389547 |
| concepts[8].display_name | Risk analysis (engineering) |
| concepts[9].id | https://openalex.org/C159886148 |
| concepts[9].level | 2 |
| concepts[9].score | 0.3906143307685852 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q176645 |
| concepts[9].display_name | Markov process |
| concepts[10].id | https://openalex.org/C42475967 |
| concepts[10].level | 1 |
| concepts[10].score | 0.33162540197372437 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q194292 |
| concepts[10].display_name | Operations research |
| concepts[11].id | https://openalex.org/C154945302 |
| concepts[11].level | 1 |
| concepts[11].score | 0.29594558477401733 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[11].display_name | Artificial intelligence |
| concepts[12].id | https://openalex.org/C127413603 |
| concepts[12].level | 0 |
| concepts[12].score | 0.10021442174911499 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q11023 |
| concepts[12].display_name | Engineering |
| concepts[13].id | https://openalex.org/C144133560 |
| concepts[13].level | 0 |
| concepts[13].score | 0.09698054194450378 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q4830453 |
| concepts[13].display_name | Business |
| concepts[14].id | https://openalex.org/C33923547 |
| concepts[14].level | 0 |
| concepts[14].score | 0.08338531851768494 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[14].display_name | Mathematics |
| concepts[15].id | https://openalex.org/C111919701 |
| concepts[15].level | 1 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q9135 |
| concepts[15].display_name | Operating system |
| concepts[16].id | https://openalex.org/C153294291 |
| concepts[16].level | 1 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q25261 |
| concepts[16].display_name | Meteorology |
| concepts[17].id | https://openalex.org/C105795698 |
| concepts[17].level | 1 |
| concepts[17].score | 0.0 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q12483 |
| concepts[17].display_name | Statistics |
| concepts[18].id | https://openalex.org/C121332964 |
| concepts[18].level | 0 |
| concepts[18].score | 0.0 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[18].display_name | Physics |
| concepts[19].id | https://openalex.org/C15744967 |
| concepts[19].level | 0 |
| concepts[19].score | 0.0 |
| concepts[19].wikidata | https://www.wikidata.org/wiki/Q9418 |
| concepts[19].display_name | Psychology |
| concepts[20].id | https://openalex.org/C118552586 |
| concepts[20].level | 1 |
| concepts[20].score | 0.0 |
| concepts[20].wikidata | https://www.wikidata.org/wiki/Q7867 |
| concepts[20].display_name | Psychiatry |
| keywords[0].id | https://openalex.org/keywords/reinforcement-learning |
| keywords[0].score | 0.8874133825302124 |
| keywords[0].display_name | Reinforcement learning |
| keywords[1].id | https://openalex.org/keywords/markov-decision-process |
| keywords[1].score | 0.8101300001144409 |
| keywords[1].display_name | Markov decision process |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.7266027927398682 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/intervention |
| keywords[3].score | 0.599596381187439 |
| keywords[3].display_name | Intervention (counseling) |
| keywords[4].id | https://openalex.org/keywords/software-deployment |
| keywords[4].score | 0.5050044655799866 |
| keywords[4].display_name | Software deployment |
| keywords[5].id | https://openalex.org/keywords/mathematical-optimization |
| keywords[5].score | 0.4588014483451843 |
| keywords[5].display_name | Mathematical optimization |
| keywords[6].id | https://openalex.org/keywords/process |
| keywords[6].score | 0.4410219192504883 |
| keywords[6].display_name | Process (computing) |
| keywords[7].id | https://openalex.org/keywords/training |
| keywords[7].score | 0.42957448959350586 |
| keywords[7].display_name | Training (meteorology) |
| keywords[8].id | https://openalex.org/keywords/risk-analysis |
| keywords[8].score | 0.41235288977622986 |
| keywords[8].display_name | Risk analysis (engineering) |
| keywords[9].id | https://openalex.org/keywords/markov-process |
| keywords[9].score | 0.3906143307685852 |
| keywords[9].display_name | Markov process |
| keywords[10].id | https://openalex.org/keywords/operations-research |
| keywords[10].score | 0.33162540197372437 |
| keywords[10].display_name | Operations research |
| keywords[11].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[11].score | 0.29594558477401733 |
| keywords[11].display_name | Artificial intelligence |
| keywords[12].id | https://openalex.org/keywords/engineering |
| keywords[12].score | 0.10021442174911499 |
| keywords[12].display_name | Engineering |
| keywords[13].id | https://openalex.org/keywords/business |
| keywords[13].score | 0.09698054194450378 |
| keywords[13].display_name | Business |
| keywords[14].id | https://openalex.org/keywords/mathematics |
| keywords[14].score | 0.08338531851768494 |
| keywords[14].display_name | Mathematics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2106.09110 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2106.09110 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2106.09110 |
| locations[1].id | doi:10.48550/arxiv.2106.09110 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2106.09110 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5010914575 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Nolan Wagener |
| authorships[0].countries | US |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I130701444 |
| authorships[0].affiliations[0].raw_affiliation_string | GEORGIA TECHNOLOGY |
| authorships[0].institutions[0].id | https://openalex.org/I130701444 |
| authorships[0].institutions[0].ror | https://ror.org/01zkghx44 |
| authorships[0].institutions[0].type | education |
| authorships[0].institutions[0].lineage | https://openalex.org/I130701444 |
| authorships[0].institutions[0].country_code | US |
| authorships[0].institutions[0].display_name | Georgia Institute of Technology |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Nolan Wagener |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | GEORGIA TECHNOLOGY |
| authorships[1].author.id | |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | |
| authorships[1].countries | US |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I201448701 |
| authorships[1].affiliations[0].raw_affiliation_string | University of Washington ; |
| authorships[1].institutions[0].id | https://openalex.org/I201448701 |
| authorships[1].institutions[0].ror | https://ror.org/00cvxb145 |
| authorships[1].institutions[0].type | education |
| authorships[1].institutions[0].lineage | https://openalex.org/I201448701 |
| authorships[1].institutions[0].country_code | US |
| authorships[1].institutions[0].display_name | University of Washington |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Byron Boots |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | University of Washington ; |
| authorships[2].author.id | https://openalex.org/A5102953050 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Cheng Chang |
| authorships[2].countries | GB |
| authorships[2].affiliations[0].institution_ids | https://openalex.org/I4210164937 |
| authorships[2].affiliations[0].raw_affiliation_string | Microsoft Research#TAB# |
| authorships[2].institutions[0].id | https://openalex.org/I4210164937 |
| authorships[2].institutions[0].ror | https://ror.org/05k87vq12 |
| authorships[2].institutions[0].type | company |
| authorships[2].institutions[0].lineage | https://openalex.org/I1290206253, https://openalex.org/I4210164937 |
| authorships[2].institutions[0].country_code | GB |
| authorships[2].institutions[0].display_name | Microsoft Research (United Kingdom) |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Ching-An Cheng |
| authorships[2].is_corresponding | False |
| authorships[2].raw_affiliation_strings | Microsoft Research#TAB# |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2106.09110 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2021-06-22T00:00:00 |
| display_name | Safe Reinforcement Learning Using Advantage-Based Intervention |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T13497 |
| primary_topic.field.id | https://openalex.org/fields/12 |
| primary_topic.field.display_name | Arts and Humanities |
| primary_topic.score | 0.9879000186920166 |
| primary_topic.domain.id | https://openalex.org/domains/2 |
| primary_topic.domain.display_name | Social Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1211 |
| primary_topic.subfield.display_name | Philosophy |
| primary_topic.display_name | Hermeneutics and Narrative Identity |
| related_works | https://openalex.org/W13717812, https://openalex.org/W102453, https://openalex.org/W13469974, https://openalex.org/W7587899, https://openalex.org/W1279312, https://openalex.org/W9932698, https://openalex.org/W6242441, https://openalex.org/W13374848, https://openalex.org/W1937329, https://openalex.org/W7455958 |
| cited_by_count | 5 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 1 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 1 |
| counts_by_year[2].year | 2023 |
| counts_by_year[2].cited_by_count | 2 |
| counts_by_year[3].year | 2022 |
| counts_by_year[3].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2106.09110 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2106.09110 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2106.09110 |
| primary_location.id | pmh:oai:arXiv.org:2106.09110 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2106.09110 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2106.09110 |
| publication_date | 2021-06-16 |
| publication_year | 2021 |
| referenced_works_count | 0 |
| abstract_inverted_index.A | 48 |
| abstract_inverted_index.a | 6, 33, 78, 165 |
| abstract_inverted_index.In | 65, 141 |
| abstract_inverted_index.RL | 105, 157 |
| abstract_inverted_index.We | 76 |
| abstract_inverted_index.an | 45, 59, 84 |
| abstract_inverted_index.as | 42 |
| abstract_inverted_index.at | 179 |
| abstract_inverted_index.be | 170 |
| abstract_inverted_index.in | 58 |
| abstract_inverted_index.is | 51, 177 |
| abstract_inverted_index.of | 25 |
| abstract_inverted_index.on | 22, 88, 117 |
| abstract_inverted_index.to | 91, 136, 164 |
| abstract_inverted_index.we | 68, 144 |
| abstract_inverted_index.MDP | 160 |
| abstract_inverted_index.Our | 111, 175 |
| abstract_inverted_index.and | 98, 122, 127, 132, 158, 162 |
| abstract_inverted_index.can | 169 |
| abstract_inverted_index.far | 150 |
| abstract_inverted_index.for | 72, 108 |
| abstract_inverted_index.has | 20 |
| abstract_inverted_index.new | 79 |
| abstract_inverted_index.our | 142 |
| abstract_inverted_index.the | 23, 73, 93, 100, 129, 137 |
| abstract_inverted_index.(RL) | 29 |
| abstract_inverted_index.Many | 0 |
| abstract_inverted_index.both | 120 |
| abstract_inverted_index.code | 176 |
| abstract_inverted_index.keep | 92 |
| abstract_inverted_index.less | 151 |
| abstract_inverted_index.much | 17 |
| abstract_inverted_index.open | 46 |
| abstract_inverted_index.safe | 26, 34, 95, 156 |
| abstract_inverted_index.show | 145 |
| abstract_inverted_index.than | 154 |
| abstract_inverted_index.that | 8, 31, 82, 146, 168 |
| abstract_inverted_index.this | 66, 70 |
| abstract_inverted_index.uses | 83 |
| abstract_inverted_index.well | 43 |
| abstract_inverted_index.with | 114 |
| abstract_inverted_index.MDPs. | 110 |
| abstract_inverted_index.SAILR | 147 |
| abstract_inverted_index.after | 36, 125 |
| abstract_inverted_index.agent | 94 |
| abstract_inverted_index.based | 87 |
| abstract_inverted_index.comes | 113 |
| abstract_inverted_index.still | 55 |
| abstract_inverted_index.total | 10 |
| abstract_inverted_index.using | 103 |
| abstract_inverted_index.while | 12, 54 |
| abstract_inverted_index.work, | 67 |
| abstract_inverted_index.(MDP). | 64 |
| abstract_inverted_index.(i.e., | 124 |
| abstract_inverted_index.Markov | 61 |
| abstract_inverted_index.SAILR, | 81 |
| abstract_inverted_index.during | 40, 119, 152 |
| abstract_inverted_index.method | 112 |
| abstract_inverted_index.policy | 7, 35, 102, 133, 167 |
| abstract_inverted_index.recent | 18 |
| abstract_inverted_index.reward | 11 |
| abstract_inverted_index.safely | 172 |
| abstract_inverted_index.safety | 14, 39, 118 |
| abstract_inverted_index.strong | 115 |
| abstract_inverted_index.address | 69 |
| abstract_inverted_index.agent's | 101 |
| abstract_inverted_index.finding | 5 |
| abstract_inverted_index.focused | 21 |
| abstract_inverted_index.involve | 4 |
| abstract_inverted_index.obeying | 13 |
| abstract_inverted_index.optimal | 138 |
| abstract_inverted_index.policy. | 140 |
| abstract_inverted_index.problem | 71 |
| abstract_inverted_index.process | 63 |
| abstract_inverted_index.produce | 32 |
| abstract_inverted_index.propose | 77 |
| abstract_inverted_index.remains | 44 |
| abstract_inverted_index.unknown | 60 |
| abstract_inverted_index.without | 128, 173 |
| abstract_inverted_index.Although | 16 |
| abstract_inverted_index.compared | 135 |
| abstract_inverted_index.decision | 2, 62 |
| abstract_inverted_index.deployed | 171 |
| abstract_inverted_index.designed | 107 |
| abstract_inverted_index.ensuring | 38 |
| abstract_inverted_index.learning | 28 |
| abstract_inverted_index.problem. | 47 |
| abstract_inverted_index.problems | 3 |
| abstract_inverted_index.research | 19 |
| abstract_inverted_index.setting. | 75 |
| abstract_inverted_index.standard | 155 |
| abstract_inverted_index.training | 41, 97, 121, 126, 153 |
| abstract_inverted_index.violates | 148 |
| abstract_inverted_index.advantage | 89 |
| abstract_inverted_index.available | 178 |
| abstract_inverted_index.challenge | 50 |
| abstract_inverted_index.converges | 163 |
| abstract_inverted_index.functions | 90 |
| abstract_inverted_index.maximizes | 9 |
| abstract_inverted_index.mechanism | 86 |
| abstract_inverted_index.optimizes | 99 |
| abstract_inverted_index.training, | 37 |
| abstract_inverted_index.algorithm, | 80 |
| abstract_inverted_index.algorithms | 30, 106 |
| abstract_inverted_index.approaches | 161 |
| abstract_inverted_index.deployment | 123 |
| abstract_inverted_index.guarantees | 116 |
| abstract_inverted_index.mechanism) | 131 |
| abstract_inverted_index.performing | 52 |
| abstract_inverted_index.satisfying | 56 |
| abstract_inverted_index.sequential | 1 |
| abstract_inverted_index.throughout | 96 |
| abstract_inverted_index.constrained | 159 |
| abstract_inverted_index.constraints | 57, 149 |
| abstract_inverted_index.development | 24 |
| abstract_inverted_index.exploration | 53 |
| abstract_inverted_index.fundamental | 49 |
| abstract_inverted_index.performance | 134 |
| abstract_inverted_index.constraints. | 15 |
| abstract_inverted_index.experiments, | 143 |
| abstract_inverted_index.intervention | 85, 130 |
| abstract_inverted_index.intervention. | 174 |
| abstract_inverted_index.off-the-shelf | 104 |
| abstract_inverted_index.reinforcement | 27 |
| abstract_inverted_index.unconstrained | 109 |
| abstract_inverted_index.well-performing | 166 |
| abstract_inverted_index.chance-constrained | 74 |
| abstract_inverted_index.safety-constrained | 139 |
| abstract_inverted_index.https://github.com/nolanwagener/safe_rl. | 180 |
| cited_by_percentile_year | |
| countries_distinct_count | 2 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile |