Safe Reinforcement Learning using Action Projection: Safeguard the Policy or the Environment? Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2509.12833
Projection-based safety filters, which modify unsafe actions by mapping them to the closest safe alternative, are widely used to enforce safety constraints in reinforcement learning (RL). Two integration strategies are commonly considered: Safe environment RL (SE-RL), where the safeguard is treated as part of the environment, and safe policy RL (SP-RL), where it is embedded within the policy through differentiable optimization layers. Despite their practical relevance in safety-critical settings, a formal understanding of their differences is lacking. In this work, we present a theoretical comparison of SE-RL and SP-RL. We identify a key distinction in how each approach is affected by action aliasing, a phenomenon in which multiple unsafe actions are projected to the same safe action, causing information loss in the policy gradients. In SE-RL, this effect is implicitly approximated by the critic, while in SP-RL, it manifests directly as rank-deficient Jacobians during backpropagation through the safeguard. Our contributions are threefold: (i) a unified formalization of SE-RL and SP-RL in the context of actor-critic algorithms, (ii) a theoretical analysis of their respective policy gradient estimates, highlighting the role of action aliasing, and (iii) a comparative study of mitigation strategies, including a novel penalty-based improvement for SP-RL that aligns with established SE-RL practices. Empirical results support our theoretical predictions, showing that action aliasing is more detrimental for SP-RL than for SE-RL. However, with appropriate improvement strategies, SP-RL can match or outperform improved SE-RL across a range of environments. These findings provide actionable insights for choosing and refining projection-based safe RL methods based on task characteristics.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2509.12833
- https://arxiv.org/pdf/2509.12833
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4415316914
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4415316914Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2509.12833Digital Object Identifier
- Title
-
Safe Reinforcement Learning using Action Projection: Safeguard the Policy or the Environment?Work title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-09-16Full publication date if available
- Authors
-
Hannah Markgraf, Shrutika S. Sawant, Hanna Krasowski, Lukas Schäfer, Sébastien Gros, Matthias AlthoffList of authors in order
- Landing page
-
https://arxiv.org/abs/2509.12833Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2509.12833Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2509.12833Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4415316914 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2509.12833 |
| ids.doi | https://doi.org/10.48550/arxiv.2509.12833 |
| ids.openalex | https://openalex.org/W4415316914 |
| fwci | |
| type | preprint |
| title | Safe Reinforcement Learning using Action Projection: Safeguard the Policy or the Environment? |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10826 |
| topics[0].field.id | https://openalex.org/fields/32 |
| topics[0].field.display_name | Psychology |
| topics[0].score | 0.5052000284194946 |
| topics[0].domain.id | https://openalex.org/domains/2 |
| topics[0].domain.display_name | Social Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/3204 |
| topics[0].subfield.display_name | Developmental and Educational Psychology |
| topics[0].display_name | Behavioral and Psychological Studies |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2509.12833 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2509.12833 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2509.12833 |
| locations[1].id | doi:10.48550/arxiv.2509.12833 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2509.12833 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5062330485 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Hannah Markgraf |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Markgraf, Hannah |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5049589975 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-1532-947X |
| authorships[1].author.display_name | Shrutika S. Sawant |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Sawant, Shamburaj |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5065905965 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-6730-3802 |
| authorships[2].author.display_name | Hanna Krasowski |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Krasowski, Hanna |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5068676054 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-4335-9342 |
| authorships[3].author.display_name | Lukas Schäfer |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Schäfer, Lukas |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5049645185 |
| authorships[4].author.orcid | https://orcid.org/0000-0001-6054-2133 |
| authorships[4].author.display_name | Sébastien Gros |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Gros, Sebastien |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5005383495 |
| authorships[5].author.orcid | https://orcid.org/0000-0003-3733-842X |
| authorships[5].author.display_name | Matthias Althoff |
| authorships[5].author_position | last |
| authorships[5].raw_author_name | Althoff, Matthias |
| authorships[5].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2509.12833 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-18T00:00:00 |
| display_name | Safe Reinforcement Learning using Action Projection: Safeguard the Policy or the Environment? |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10826 |
| primary_topic.field.id | https://openalex.org/fields/32 |
| primary_topic.field.display_name | Psychology |
| primary_topic.score | 0.5052000284194946 |
| primary_topic.domain.id | https://openalex.org/domains/2 |
| primary_topic.domain.display_name | Social Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/3204 |
| primary_topic.subfield.display_name | Developmental and Educational Psychology |
| primary_topic.display_name | Behavioral and Psychological Studies |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2509.12833 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2509.12833 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2509.12833 |
| primary_location.id | pmh:oai:arXiv.org:2509.12833 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2509.12833 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2509.12833 |
| publication_date | 2025-09-16 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 69, 82, 91, 103, 153, 167, 184, 191, 234 |
| abstract_inverted_index.In | 77, 124 |
| abstract_inverted_index.RL | 34, 49, 249 |
| abstract_inverted_index.We | 89 |
| abstract_inverted_index.as | 41, 140 |
| abstract_inverted_index.by | 7, 100, 131 |
| abstract_inverted_index.in | 22, 66, 94, 105, 120, 135, 160 |
| abstract_inverted_index.is | 39, 53, 75, 98, 128, 213 |
| abstract_inverted_index.it | 52, 137 |
| abstract_inverted_index.of | 43, 72, 85, 156, 163, 170, 179, 187, 236 |
| abstract_inverted_index.on | 252 |
| abstract_inverted_index.or | 229 |
| abstract_inverted_index.to | 10, 18, 112 |
| abstract_inverted_index.we | 80 |
| abstract_inverted_index.(i) | 152 |
| abstract_inverted_index.Our | 148 |
| abstract_inverted_index.Two | 26 |
| abstract_inverted_index.and | 46, 87, 158, 182, 245 |
| abstract_inverted_index.are | 15, 29, 110, 150 |
| abstract_inverted_index.can | 227 |
| abstract_inverted_index.for | 195, 216, 219, 243 |
| abstract_inverted_index.how | 95 |
| abstract_inverted_index.key | 92 |
| abstract_inverted_index.our | 206 |
| abstract_inverted_index.the | 11, 37, 44, 56, 113, 121, 132, 146, 161, 177 |
| abstract_inverted_index.(ii) | 166 |
| abstract_inverted_index.Safe | 32 |
| abstract_inverted_index.each | 96 |
| abstract_inverted_index.loss | 119 |
| abstract_inverted_index.more | 214 |
| abstract_inverted_index.part | 42 |
| abstract_inverted_index.role | 178 |
| abstract_inverted_index.safe | 13, 47, 115, 248 |
| abstract_inverted_index.same | 114 |
| abstract_inverted_index.task | 253 |
| abstract_inverted_index.than | 218 |
| abstract_inverted_index.that | 197, 210 |
| abstract_inverted_index.them | 9 |
| abstract_inverted_index.this | 78, 126 |
| abstract_inverted_index.used | 17 |
| abstract_inverted_index.with | 199, 222 |
| abstract_inverted_index.(RL). | 25 |
| abstract_inverted_index.(iii) | 183 |
| abstract_inverted_index.SE-RL | 86, 157, 201, 232 |
| abstract_inverted_index.SP-RL | 159, 196, 217, 226 |
| abstract_inverted_index.These | 238 |
| abstract_inverted_index.based | 251 |
| abstract_inverted_index.match | 228 |
| abstract_inverted_index.novel | 192 |
| abstract_inverted_index.range | 235 |
| abstract_inverted_index.study | 186 |
| abstract_inverted_index.their | 63, 73, 171 |
| abstract_inverted_index.where | 36, 51 |
| abstract_inverted_index.which | 3, 106 |
| abstract_inverted_index.while | 134 |
| abstract_inverted_index.work, | 79 |
| abstract_inverted_index.SE-RL, | 125 |
| abstract_inverted_index.SE-RL. | 220 |
| abstract_inverted_index.SP-RL, | 136 |
| abstract_inverted_index.SP-RL. | 88 |
| abstract_inverted_index.across | 233 |
| abstract_inverted_index.action | 101, 180, 211 |
| abstract_inverted_index.aligns | 198 |
| abstract_inverted_index.during | 143 |
| abstract_inverted_index.effect | 127 |
| abstract_inverted_index.formal | 70 |
| abstract_inverted_index.modify | 4 |
| abstract_inverted_index.policy | 48, 57, 122, 173 |
| abstract_inverted_index.safety | 1, 20 |
| abstract_inverted_index.unsafe | 5, 108 |
| abstract_inverted_index.widely | 16 |
| abstract_inverted_index.within | 55 |
| abstract_inverted_index.Despite | 62 |
| abstract_inverted_index.action, | 116 |
| abstract_inverted_index.actions | 6, 109 |
| abstract_inverted_index.causing | 117 |
| abstract_inverted_index.closest | 12 |
| abstract_inverted_index.context | 162 |
| abstract_inverted_index.critic, | 133 |
| abstract_inverted_index.enforce | 19 |
| abstract_inverted_index.layers. | 61 |
| abstract_inverted_index.mapping | 8 |
| abstract_inverted_index.methods | 250 |
| abstract_inverted_index.present | 81 |
| abstract_inverted_index.provide | 240 |
| abstract_inverted_index.results | 204 |
| abstract_inverted_index.showing | 209 |
| abstract_inverted_index.support | 205 |
| abstract_inverted_index.through | 58, 145 |
| abstract_inverted_index.treated | 40 |
| abstract_inverted_index.unified | 154 |
| abstract_inverted_index.(SE-RL), | 35 |
| abstract_inverted_index.(SP-RL), | 50 |
| abstract_inverted_index.However, | 221 |
| abstract_inverted_index.affected | 99 |
| abstract_inverted_index.aliasing | 212 |
| abstract_inverted_index.analysis | 169 |
| abstract_inverted_index.approach | 97 |
| abstract_inverted_index.choosing | 244 |
| abstract_inverted_index.commonly | 30 |
| abstract_inverted_index.directly | 139 |
| abstract_inverted_index.embedded | 54 |
| abstract_inverted_index.filters, | 2 |
| abstract_inverted_index.findings | 239 |
| abstract_inverted_index.gradient | 174 |
| abstract_inverted_index.identify | 90 |
| abstract_inverted_index.improved | 231 |
| abstract_inverted_index.insights | 242 |
| abstract_inverted_index.lacking. | 76 |
| abstract_inverted_index.learning | 24 |
| abstract_inverted_index.multiple | 107 |
| abstract_inverted_index.refining | 246 |
| abstract_inverted_index.Empirical | 203 |
| abstract_inverted_index.Jacobians | 142 |
| abstract_inverted_index.aliasing, | 102, 181 |
| abstract_inverted_index.including | 190 |
| abstract_inverted_index.manifests | 138 |
| abstract_inverted_index.practical | 64 |
| abstract_inverted_index.projected | 111 |
| abstract_inverted_index.relevance | 65 |
| abstract_inverted_index.safeguard | 38 |
| abstract_inverted_index.settings, | 68 |
| abstract_inverted_index.actionable | 241 |
| abstract_inverted_index.comparison | 84 |
| abstract_inverted_index.estimates, | 175 |
| abstract_inverted_index.gradients. | 123 |
| abstract_inverted_index.implicitly | 129 |
| abstract_inverted_index.mitigation | 188 |
| abstract_inverted_index.outperform | 230 |
| abstract_inverted_index.phenomenon | 104 |
| abstract_inverted_index.practices. | 202 |
| abstract_inverted_index.respective | 172 |
| abstract_inverted_index.safeguard. | 147 |
| abstract_inverted_index.strategies | 28 |
| abstract_inverted_index.threefold: | 151 |
| abstract_inverted_index.algorithms, | 165 |
| abstract_inverted_index.appropriate | 223 |
| abstract_inverted_index.comparative | 185 |
| abstract_inverted_index.considered: | 31 |
| abstract_inverted_index.constraints | 21 |
| abstract_inverted_index.detrimental | 215 |
| abstract_inverted_index.differences | 74 |
| abstract_inverted_index.distinction | 93 |
| abstract_inverted_index.environment | 33 |
| abstract_inverted_index.established | 200 |
| abstract_inverted_index.improvement | 194, 224 |
| abstract_inverted_index.information | 118 |
| abstract_inverted_index.integration | 27 |
| abstract_inverted_index.strategies, | 189, 225 |
| abstract_inverted_index.theoretical | 83, 168, 207 |
| abstract_inverted_index.actor-critic | 164 |
| abstract_inverted_index.alternative, | 14 |
| abstract_inverted_index.approximated | 130 |
| abstract_inverted_index.environment, | 45 |
| abstract_inverted_index.highlighting | 176 |
| abstract_inverted_index.optimization | 60 |
| abstract_inverted_index.predictions, | 208 |
| abstract_inverted_index.contributions | 149 |
| abstract_inverted_index.environments. | 237 |
| abstract_inverted_index.formalization | 155 |
| abstract_inverted_index.penalty-based | 193 |
| abstract_inverted_index.reinforcement | 23 |
| abstract_inverted_index.understanding | 71 |
| abstract_inverted_index.differentiable | 59 |
| abstract_inverted_index.rank-deficient | 141 |
| abstract_inverted_index.backpropagation | 144 |
| abstract_inverted_index.safety-critical | 67 |
| abstract_inverted_index.Projection-based | 0 |
| abstract_inverted_index.characteristics. | 254 |
| abstract_inverted_index.projection-based | 247 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 6 |
| citation_normalized_percentile |