Probabilistic Satisfaction of Temporal Logic Constraints in Reinforcement Learning via Adaptive Policy-Switching Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2410.08022
Constrained Reinforcement Learning (CRL) is a subset of machine learning that introduces constraints into the traditional reinforcement learning (RL) framework. Unlike conventional RL which aims solely to maximize cumulative rewards, CRL incorporates additional constraints that represent specific mission requirements or limitations that the agent must comply with during the learning process. In this paper, we address a type of CRL problem where an agent aims to learn the optimal policy to maximize reward while ensuring a desired level of temporal logic constraint satisfaction throughout the learning process. We propose a novel framework that relies on switching between pure learning (reward maximization) and constraint satisfaction. This framework estimates the probability of constraint satisfaction based on earlier trials and properly adjusts the probability of switching between learning and constraint satisfaction policies. We theoretically validate the correctness of the proposed algorithm and demonstrate its performance through comprehensive simulations.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2410.08022
- https://arxiv.org/pdf/2410.08022
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4403365237
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4403365237Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2410.08022Digital Object Identifier
- Title
-
Probabilistic Satisfaction of Temporal Logic Constraints in Reinforcement Learning via Adaptive Policy-SwitchingWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-10-10Full publication date if available
- Authors
-
Xiaoshan Lin, Serdar Yüksel, Yasin Yazıcıoğlu, Derya AksarayList of authors in order
- Landing page
-
https://arxiv.org/abs/2410.08022Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2410.08022Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2410.08022Direct OA link when available
- Concepts
-
Reinforcement learning, Probabilistic logic, Reinforcement, Computer science, Temporal logic, Mathematical optimization, Artificial intelligence, Mathematics, Theoretical computer science, Psychology, Social psychologyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4403365237 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2410.08022 |
| ids.doi | https://doi.org/10.48550/arxiv.2410.08022 |
| ids.openalex | https://openalex.org/W4403365237 |
| fwci | |
| type | preprint |
| title | Probabilistic Satisfaction of Temporal Logic Constraints in Reinforcement Learning via Adaptive Policy-Switching |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10142 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.7674999833106995 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1703 |
| topics[0].subfield.display_name | Computational Theory and Mathematics |
| topics[0].display_name | Formal Methods in Verification |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C97541855 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8187705278396606 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q830687 |
| concepts[0].display_name | Reinforcement learning |
| concepts[1].id | https://openalex.org/C49937458 |
| concepts[1].level | 2 |
| concepts[1].score | 0.7888191938400269 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q2599292 |
| concepts[1].display_name | Probabilistic logic |
| concepts[2].id | https://openalex.org/C67203356 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6207969784736633 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q1321905 |
| concepts[2].display_name | Reinforcement |
| concepts[3].id | https://openalex.org/C41008148 |
| concepts[3].level | 0 |
| concepts[3].score | 0.5299530625343323 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[3].display_name | Computer science |
| concepts[4].id | https://openalex.org/C25016198 |
| concepts[4].level | 2 |
| concepts[4].score | 0.4332289397716522 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q781833 |
| concepts[4].display_name | Temporal logic |
| concepts[5].id | https://openalex.org/C126255220 |
| concepts[5].level | 1 |
| concepts[5].score | 0.3984062075614929 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q141495 |
| concepts[5].display_name | Mathematical optimization |
| concepts[6].id | https://openalex.org/C154945302 |
| concepts[6].level | 1 |
| concepts[6].score | 0.3845491409301758 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[6].display_name | Artificial intelligence |
| concepts[7].id | https://openalex.org/C33923547 |
| concepts[7].level | 0 |
| concepts[7].score | 0.2992885708808899 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[7].display_name | Mathematics |
| concepts[8].id | https://openalex.org/C80444323 |
| concepts[8].level | 1 |
| concepts[8].score | 0.28475114703178406 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q2878974 |
| concepts[8].display_name | Theoretical computer science |
| concepts[9].id | https://openalex.org/C15744967 |
| concepts[9].level | 0 |
| concepts[9].score | 0.22931483387947083 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q9418 |
| concepts[9].display_name | Psychology |
| concepts[10].id | https://openalex.org/C77805123 |
| concepts[10].level | 1 |
| concepts[10].score | 0.15277445316314697 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q161272 |
| concepts[10].display_name | Social psychology |
| keywords[0].id | https://openalex.org/keywords/reinforcement-learning |
| keywords[0].score | 0.8187705278396606 |
| keywords[0].display_name | Reinforcement learning |
| keywords[1].id | https://openalex.org/keywords/probabilistic-logic |
| keywords[1].score | 0.7888191938400269 |
| keywords[1].display_name | Probabilistic logic |
| keywords[2].id | https://openalex.org/keywords/reinforcement |
| keywords[2].score | 0.6207969784736633 |
| keywords[2].display_name | Reinforcement |
| keywords[3].id | https://openalex.org/keywords/computer-science |
| keywords[3].score | 0.5299530625343323 |
| keywords[3].display_name | Computer science |
| keywords[4].id | https://openalex.org/keywords/temporal-logic |
| keywords[4].score | 0.4332289397716522 |
| keywords[4].display_name | Temporal logic |
| keywords[5].id | https://openalex.org/keywords/mathematical-optimization |
| keywords[5].score | 0.3984062075614929 |
| keywords[5].display_name | Mathematical optimization |
| keywords[6].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[6].score | 0.3845491409301758 |
| keywords[6].display_name | Artificial intelligence |
| keywords[7].id | https://openalex.org/keywords/mathematics |
| keywords[7].score | 0.2992885708808899 |
| keywords[7].display_name | Mathematics |
| keywords[8].id | https://openalex.org/keywords/theoretical-computer-science |
| keywords[8].score | 0.28475114703178406 |
| keywords[8].display_name | Theoretical computer science |
| keywords[9].id | https://openalex.org/keywords/psychology |
| keywords[9].score | 0.22931483387947083 |
| keywords[9].display_name | Psychology |
| keywords[10].id | https://openalex.org/keywords/social-psychology |
| keywords[10].score | 0.15277445316314697 |
| keywords[10].display_name | Social psychology |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2410.08022 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2410.08022 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2410.08022 |
| locations[1].id | doi:10.48550/arxiv.2410.08022 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2410.08022 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5018068424 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-8913-5539 |
| authorships[0].author.display_name | Xiaoshan Lin |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Lin, Xiaoshan |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5005401257 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-6099-5001 |
| authorships[1].author.display_name | Serdar Yüksel |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Yüksel, Sadık Bera |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5041762786 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-6957-6831 |
| authorships[2].author.display_name | Yasin Yazıcıoğlu |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Yazıcıoğlu, Yasin |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5053550436 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-4236-9116 |
| authorships[3].author.display_name | Derya Aksaray |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Aksaray, Derya |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2410.08022 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Probabilistic Satisfaction of Temporal Logic Constraints in Reinforcement Learning via Adaptive Policy-Switching |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10142 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.7674999833106995 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1703 |
| primary_topic.subfield.display_name | Computational Theory and Mathematics |
| primary_topic.display_name | Formal Methods in Verification |
| related_works | https://openalex.org/W4310083477, https://openalex.org/W2328553770, https://openalex.org/W2920061524, https://openalex.org/W1977959518, https://openalex.org/W2038908348, https://openalex.org/W2107890255, https://openalex.org/W2106552856, https://openalex.org/W2145821588, https://openalex.org/W2086122291, https://openalex.org/W2138707849 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2410.08022 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2410.08022 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2410.08022 |
| primary_location.id | pmh:oai:arXiv.org:2410.08022 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2410.08022 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2410.08022 |
| publication_date | 2024-10-10 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 5, 56, 75, 89 |
| abstract_inverted_index.In | 51 |
| abstract_inverted_index.RL | 22 |
| abstract_inverted_index.We | 87, 129 |
| abstract_inverted_index.an | 62 |
| abstract_inverted_index.is | 4 |
| abstract_inverted_index.of | 7, 58, 78, 109, 121, 134 |
| abstract_inverted_index.on | 94, 113 |
| abstract_inverted_index.or | 39 |
| abstract_inverted_index.to | 26, 65, 70 |
| abstract_inverted_index.we | 54 |
| abstract_inverted_index.CRL | 30, 59 |
| abstract_inverted_index.and | 101, 116, 125, 138 |
| abstract_inverted_index.its | 140 |
| abstract_inverted_index.the | 14, 42, 48, 67, 84, 107, 119, 132, 135 |
| abstract_inverted_index.(RL) | 18 |
| abstract_inverted_index.This | 104 |
| abstract_inverted_index.aims | 24, 64 |
| abstract_inverted_index.into | 13 |
| abstract_inverted_index.must | 44 |
| abstract_inverted_index.pure | 97 |
| abstract_inverted_index.that | 10, 34, 41, 92 |
| abstract_inverted_index.this | 52 |
| abstract_inverted_index.type | 57 |
| abstract_inverted_index.with | 46 |
| abstract_inverted_index.(CRL) | 3 |
| abstract_inverted_index.agent | 43, 63 |
| abstract_inverted_index.based | 112 |
| abstract_inverted_index.learn | 66 |
| abstract_inverted_index.level | 77 |
| abstract_inverted_index.logic | 80 |
| abstract_inverted_index.novel | 90 |
| abstract_inverted_index.where | 61 |
| abstract_inverted_index.which | 23 |
| abstract_inverted_index.while | 73 |
| abstract_inverted_index.Unlike | 20 |
| abstract_inverted_index.comply | 45 |
| abstract_inverted_index.during | 47 |
| abstract_inverted_index.paper, | 53 |
| abstract_inverted_index.policy | 69 |
| abstract_inverted_index.relies | 93 |
| abstract_inverted_index.reward | 72 |
| abstract_inverted_index.solely | 25 |
| abstract_inverted_index.subset | 6 |
| abstract_inverted_index.trials | 115 |
| abstract_inverted_index.(reward | 99 |
| abstract_inverted_index.address | 55 |
| abstract_inverted_index.adjusts | 118 |
| abstract_inverted_index.between | 96, 123 |
| abstract_inverted_index.desired | 76 |
| abstract_inverted_index.earlier | 114 |
| abstract_inverted_index.machine | 8 |
| abstract_inverted_index.mission | 37 |
| abstract_inverted_index.optimal | 68 |
| abstract_inverted_index.problem | 60 |
| abstract_inverted_index.propose | 88 |
| abstract_inverted_index.through | 142 |
| abstract_inverted_index.Learning | 2 |
| abstract_inverted_index.ensuring | 74 |
| abstract_inverted_index.learning | 9, 17, 49, 85, 98, 124 |
| abstract_inverted_index.maximize | 27, 71 |
| abstract_inverted_index.process. | 50, 86 |
| abstract_inverted_index.properly | 117 |
| abstract_inverted_index.proposed | 136 |
| abstract_inverted_index.rewards, | 29 |
| abstract_inverted_index.specific | 36 |
| abstract_inverted_index.temporal | 79 |
| abstract_inverted_index.validate | 131 |
| abstract_inverted_index.algorithm | 137 |
| abstract_inverted_index.estimates | 106 |
| abstract_inverted_index.framework | 91, 105 |
| abstract_inverted_index.policies. | 128 |
| abstract_inverted_index.represent | 35 |
| abstract_inverted_index.switching | 95, 122 |
| abstract_inverted_index.additional | 32 |
| abstract_inverted_index.constraint | 81, 102, 110, 126 |
| abstract_inverted_index.cumulative | 28 |
| abstract_inverted_index.framework. | 19 |
| abstract_inverted_index.introduces | 11 |
| abstract_inverted_index.throughout | 83 |
| abstract_inverted_index.Constrained | 0 |
| abstract_inverted_index.constraints | 12, 33 |
| abstract_inverted_index.correctness | 133 |
| abstract_inverted_index.demonstrate | 139 |
| abstract_inverted_index.limitations | 40 |
| abstract_inverted_index.performance | 141 |
| abstract_inverted_index.probability | 108, 120 |
| abstract_inverted_index.traditional | 15 |
| abstract_inverted_index.conventional | 21 |
| abstract_inverted_index.incorporates | 31 |
| abstract_inverted_index.requirements | 38 |
| abstract_inverted_index.satisfaction | 82, 111, 127 |
| abstract_inverted_index.simulations. | 144 |
| abstract_inverted_index.Reinforcement | 1 |
| abstract_inverted_index.comprehensive | 143 |
| abstract_inverted_index.maximization) | 100 |
| abstract_inverted_index.reinforcement | 16 |
| abstract_inverted_index.satisfaction. | 103 |
| abstract_inverted_index.theoretically | 130 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile |