A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2404.16468
Model-free reinforcement learning methods lack an inherent mechanism to impose behavioural constraints on the trained policies. Although certain extensions exist, they remain limited to specific types of constraints, such as value constraints with additional reward signals or visitation density constraints. In this work we unify these existing techniques and bridge the gap with classical optimization and control theory, using a generic primal-dual framework for value-based and actor-critic reinforcement learning methods. The obtained dual formulations turn out to be especially useful for imposing additional constraints on the learned policy, as an intrinsic relationship between such dual constraints (or regularization terms) and reward modifications in the primal is revealed. Furthermore, using this framework, we are able to introduce some novel types of constraints, allowing to impose bounds on the policy's action density or on costs associated with transitions between consecutive states and actions. From the adjusted primal-dual optimization problems, a practical algorithm is derived that supports various combinations of policy constraints that are automatically handled throughout training using trainable reward modifications. The proposed $\texttt{DualCRL}$ method is examined in more detail and evaluated under different (combinations of) constraints on two interpretable environments. The results highlight the efficacy of the method, which ultimately provides the designer of such systems with a versatile toolbox of possible policy constraints.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2404.16468
- https://arxiv.org/pdf/2404.16468
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4395686977
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4395686977Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2404.16468Digital Object Identifier
- Title
-
A Dual Perspective of Reinforcement Learning for Imposing Policy ConstraintsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-04-25Full publication date if available
- Authors
-
Bram De Cooman, Johan A. K. SuykensList of authors in order
- Landing page
-
https://arxiv.org/abs/2404.16468Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2404.16468Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2404.16468Direct OA link when available
- Concepts
-
Reinforcement learning, Perspective (graphical), Dual (grammatical number), Computer science, Reinforcement, Artificial intelligence, Psychology, Social psychology, Philosophy, LinguisticsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4395686977 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2404.16468 |
| ids.doi | https://doi.org/10.48550/arxiv.2404.16468 |
| ids.openalex | https://openalex.org/W4395686977 |
| fwci | |
| type | preprint |
| title | A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10328 |
| topics[0].field.id | https://openalex.org/fields/14 |
| topics[0].field.display_name | Business, Management and Accounting |
| topics[0].score | 0.26350000500679016 |
| topics[0].domain.id | https://openalex.org/domains/2 |
| topics[0].domain.display_name | Social Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1404 |
| topics[0].subfield.display_name | Management Information Systems |
| topics[0].display_name | Supply Chain and Inventory Management |
| topics[1].id | https://openalex.org/T11182 |
| topics[1].field.id | https://openalex.org/fields/18 |
| topics[1].field.display_name | Decision Sciences |
| topics[1].score | 0.2529999911785126 |
| topics[1].domain.id | https://openalex.org/domains/2 |
| topics[1].domain.display_name | Social Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1803 |
| topics[1].subfield.display_name | Management Science and Operations Research |
| topics[1].display_name | Auction Theory and Applications |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C97541855 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8332365155220032 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q830687 |
| concepts[0].display_name | Reinforcement learning |
| concepts[1].id | https://openalex.org/C12713177 |
| concepts[1].level | 2 |
| concepts[1].score | 0.7723938226699829 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q1900281 |
| concepts[1].display_name | Perspective (graphical) |
| concepts[2].id | https://openalex.org/C2780980858 |
| concepts[2].level | 2 |
| concepts[2].score | 0.7216349840164185 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q110022 |
| concepts[2].display_name | Dual (grammatical number) |
| concepts[3].id | https://openalex.org/C41008148 |
| concepts[3].level | 0 |
| concepts[3].score | 0.5291041731834412 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[3].display_name | Computer science |
| concepts[4].id | https://openalex.org/C67203356 |
| concepts[4].level | 2 |
| concepts[4].score | 0.519869327545166 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q1321905 |
| concepts[4].display_name | Reinforcement |
| concepts[5].id | https://openalex.org/C154945302 |
| concepts[5].level | 1 |
| concepts[5].score | 0.3556446433067322 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[5].display_name | Artificial intelligence |
| concepts[6].id | https://openalex.org/C15744967 |
| concepts[6].level | 0 |
| concepts[6].score | 0.24857425689697266 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q9418 |
| concepts[6].display_name | Psychology |
| concepts[7].id | https://openalex.org/C77805123 |
| concepts[7].level | 1 |
| concepts[7].score | 0.1378512978553772 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q161272 |
| concepts[7].display_name | Social psychology |
| concepts[8].id | https://openalex.org/C138885662 |
| concepts[8].level | 0 |
| concepts[8].score | 0.08428233861923218 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q5891 |
| concepts[8].display_name | Philosophy |
| concepts[9].id | https://openalex.org/C41895202 |
| concepts[9].level | 1 |
| concepts[9].score | 0.05382847785949707 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q8162 |
| concepts[9].display_name | Linguistics |
| keywords[0].id | https://openalex.org/keywords/reinforcement-learning |
| keywords[0].score | 0.8332365155220032 |
| keywords[0].display_name | Reinforcement learning |
| keywords[1].id | https://openalex.org/keywords/perspective |
| keywords[1].score | 0.7723938226699829 |
| keywords[1].display_name | Perspective (graphical) |
| keywords[2].id | https://openalex.org/keywords/dual |
| keywords[2].score | 0.7216349840164185 |
| keywords[2].display_name | Dual (grammatical number) |
| keywords[3].id | https://openalex.org/keywords/computer-science |
| keywords[3].score | 0.5291041731834412 |
| keywords[3].display_name | Computer science |
| keywords[4].id | https://openalex.org/keywords/reinforcement |
| keywords[4].score | 0.519869327545166 |
| keywords[4].display_name | Reinforcement |
| keywords[5].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[5].score | 0.3556446433067322 |
| keywords[5].display_name | Artificial intelligence |
| keywords[6].id | https://openalex.org/keywords/psychology |
| keywords[6].score | 0.24857425689697266 |
| keywords[6].display_name | Psychology |
| keywords[7].id | https://openalex.org/keywords/social-psychology |
| keywords[7].score | 0.1378512978553772 |
| keywords[7].display_name | Social psychology |
| keywords[8].id | https://openalex.org/keywords/philosophy |
| keywords[8].score | 0.08428233861923218 |
| keywords[8].display_name | Philosophy |
| keywords[9].id | https://openalex.org/keywords/linguistics |
| keywords[9].score | 0.05382847785949707 |
| keywords[9].display_name | Linguistics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2404.16468 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2404.16468 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2404.16468 |
| locations[1].id | doi:10.48550/arxiv.2404.16468 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2404.16468 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5084336459 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-4843-3342 |
| authorships[0].author.display_name | Bram De Cooman |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | De Cooman, Bram |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5078854904 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-8846-6352 |
| authorships[1].author.display_name | Johan A. K. Suykens |
| authorships[1].author_position | last |
| authorships[1].raw_author_name | Suykens, Johan |
| authorships[1].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2404.16468 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10328 |
| primary_topic.field.id | https://openalex.org/fields/14 |
| primary_topic.field.display_name | Business, Management and Accounting |
| primary_topic.score | 0.26350000500679016 |
| primary_topic.domain.id | https://openalex.org/domains/2 |
| primary_topic.domain.display_name | Social Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1404 |
| primary_topic.subfield.display_name | Management Information Systems |
| primary_topic.display_name | Supply Chain and Inventory Management |
| related_works | https://openalex.org/W2920061524, https://openalex.org/W4310083477, https://openalex.org/W2328553770, https://openalex.org/W1977959518, https://openalex.org/W2038908348, https://openalex.org/W2107890255, https://openalex.org/W2106552856, https://openalex.org/W2145821588, https://openalex.org/W2086122291, https://openalex.org/W1987513656 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2404.16468 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2404.16468 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2404.16468 |
| primary_location.id | pmh:oai:arXiv.org:2404.16468 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2404.16468 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2404.16468 |
| publication_date | 2024-04-25 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 59, 147, 206 |
| abstract_inverted_index.In | 40 |
| abstract_inverted_index.an | 5, 89 |
| abstract_inverted_index.as | 29, 88 |
| abstract_inverted_index.be | 77 |
| abstract_inverted_index.in | 102, 175 |
| abstract_inverted_index.is | 105, 150, 173 |
| abstract_inverted_index.of | 26, 119, 156, 194, 202, 209 |
| abstract_inverted_index.on | 12, 84, 125, 131, 185 |
| abstract_inverted_index.or | 36, 130 |
| abstract_inverted_index.to | 8, 23, 76, 114, 122 |
| abstract_inverted_index.we | 43, 111 |
| abstract_inverted_index.(or | 96 |
| abstract_inverted_index.The | 70, 169, 189 |
| abstract_inverted_index.and | 48, 55, 65, 99, 139, 178 |
| abstract_inverted_index.are | 112, 160 |
| abstract_inverted_index.for | 63, 80 |
| abstract_inverted_index.gap | 51 |
| abstract_inverted_index.of) | 183 |
| abstract_inverted_index.out | 75 |
| abstract_inverted_index.the | 13, 50, 85, 103, 126, 142, 192, 195, 200 |
| abstract_inverted_index.two | 186 |
| abstract_inverted_index.From | 141 |
| abstract_inverted_index.able | 113 |
| abstract_inverted_index.dual | 72, 94 |
| abstract_inverted_index.lack | 4 |
| abstract_inverted_index.more | 176 |
| abstract_inverted_index.some | 116 |
| abstract_inverted_index.such | 28, 93, 203 |
| abstract_inverted_index.that | 152, 159 |
| abstract_inverted_index.they | 20 |
| abstract_inverted_index.this | 41, 109 |
| abstract_inverted_index.turn | 74 |
| abstract_inverted_index.with | 32, 52, 134, 205 |
| abstract_inverted_index.work | 42 |
| abstract_inverted_index.costs | 132 |
| abstract_inverted_index.novel | 117 |
| abstract_inverted_index.these | 45 |
| abstract_inverted_index.types | 25, 118 |
| abstract_inverted_index.under | 180 |
| abstract_inverted_index.unify | 44 |
| abstract_inverted_index.using | 58, 108, 165 |
| abstract_inverted_index.value | 30 |
| abstract_inverted_index.which | 197 |
| abstract_inverted_index.action | 128 |
| abstract_inverted_index.bounds | 124 |
| abstract_inverted_index.bridge | 49 |
| abstract_inverted_index.detail | 177 |
| abstract_inverted_index.exist, | 19 |
| abstract_inverted_index.impose | 9, 123 |
| abstract_inverted_index.method | 172 |
| abstract_inverted_index.policy | 157, 211 |
| abstract_inverted_index.primal | 104 |
| abstract_inverted_index.remain | 21 |
| abstract_inverted_index.reward | 34, 100, 167 |
| abstract_inverted_index.states | 138 |
| abstract_inverted_index.terms) | 98 |
| abstract_inverted_index.useful | 79 |
| abstract_inverted_index.between | 92, 136 |
| abstract_inverted_index.certain | 17 |
| abstract_inverted_index.control | 56 |
| abstract_inverted_index.density | 38, 129 |
| abstract_inverted_index.derived | 151 |
| abstract_inverted_index.generic | 60 |
| abstract_inverted_index.handled | 162 |
| abstract_inverted_index.learned | 86 |
| abstract_inverted_index.limited | 22 |
| abstract_inverted_index.method, | 196 |
| abstract_inverted_index.methods | 3 |
| abstract_inverted_index.policy, | 87 |
| abstract_inverted_index.results | 190 |
| abstract_inverted_index.signals | 35 |
| abstract_inverted_index.systems | 204 |
| abstract_inverted_index.theory, | 57 |
| abstract_inverted_index.toolbox | 208 |
| abstract_inverted_index.trained | 14 |
| abstract_inverted_index.various | 154 |
| abstract_inverted_index.Although | 16 |
| abstract_inverted_index.actions. | 140 |
| abstract_inverted_index.adjusted | 143 |
| abstract_inverted_index.allowing | 121 |
| abstract_inverted_index.designer | 201 |
| abstract_inverted_index.efficacy | 193 |
| abstract_inverted_index.examined | 174 |
| abstract_inverted_index.existing | 46 |
| abstract_inverted_index.imposing | 81 |
| abstract_inverted_index.inherent | 6 |
| abstract_inverted_index.learning | 2, 68 |
| abstract_inverted_index.methods. | 69 |
| abstract_inverted_index.obtained | 71 |
| abstract_inverted_index.policy's | 127 |
| abstract_inverted_index.possible | 210 |
| abstract_inverted_index.proposed | 170 |
| abstract_inverted_index.provides | 199 |
| abstract_inverted_index.specific | 24 |
| abstract_inverted_index.supports | 153 |
| abstract_inverted_index.training | 164 |
| abstract_inverted_index.algorithm | 149 |
| abstract_inverted_index.classical | 53 |
| abstract_inverted_index.different | 181 |
| abstract_inverted_index.evaluated | 179 |
| abstract_inverted_index.framework | 62 |
| abstract_inverted_index.highlight | 191 |
| abstract_inverted_index.intrinsic | 90 |
| abstract_inverted_index.introduce | 115 |
| abstract_inverted_index.mechanism | 7 |
| abstract_inverted_index.policies. | 15 |
| abstract_inverted_index.practical | 148 |
| abstract_inverted_index.problems, | 146 |
| abstract_inverted_index.revealed. | 106 |
| abstract_inverted_index.trainable | 166 |
| abstract_inverted_index.versatile | 207 |
| abstract_inverted_index.Model-free | 0 |
| abstract_inverted_index.additional | 33, 82 |
| abstract_inverted_index.associated | 133 |
| abstract_inverted_index.especially | 78 |
| abstract_inverted_index.extensions | 18 |
| abstract_inverted_index.framework, | 110 |
| abstract_inverted_index.techniques | 47 |
| abstract_inverted_index.throughout | 163 |
| abstract_inverted_index.ultimately | 198 |
| abstract_inverted_index.visitation | 37 |
| abstract_inverted_index.behavioural | 10 |
| abstract_inverted_index.consecutive | 137 |
| abstract_inverted_index.constraints | 11, 31, 83, 95, 158, 184 |
| abstract_inverted_index.primal-dual | 61, 144 |
| abstract_inverted_index.transitions | 135 |
| abstract_inverted_index.value-based | 64 |
| abstract_inverted_index.Furthermore, | 107 |
| abstract_inverted_index.actor-critic | 66 |
| abstract_inverted_index.combinations | 155 |
| abstract_inverted_index.constraints, | 27, 120 |
| abstract_inverted_index.constraints. | 39, 212 |
| abstract_inverted_index.formulations | 73 |
| abstract_inverted_index.optimization | 54, 145 |
| abstract_inverted_index.relationship | 91 |
| abstract_inverted_index.(combinations | 182 |
| abstract_inverted_index.automatically | 161 |
| abstract_inverted_index.environments. | 188 |
| abstract_inverted_index.interpretable | 187 |
| abstract_inverted_index.modifications | 101 |
| abstract_inverted_index.reinforcement | 1, 67 |
| abstract_inverted_index.modifications. | 168 |
| abstract_inverted_index.regularization | 97 |
| abstract_inverted_index.$\texttt{DualCRL}$ | 171 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 2 |
| citation_normalized_percentile |