Likelihood ratio-based policy gradient methods for distorted risk measures: A non-asymptotic analysis. Article Swipe
We propose policy-gradient algorithms for solving the problem of control in a risk-sensitive reinforcement learning (RL) context. The objective of our algorithm is to maximize the distorted risk measure (DRM) of the cumulative reward in an episodic Markov decision process (MDP). We derive a variant of the policy gradient theorem that caters to the DRM objective. Using this theorem in conjunction with a likelihood ratio (LR) based gradient estimation scheme, we propose policy gradient algorithms for optimizing DRM in both on-policy and off-policy RL settings. We derive non-asymptotic bounds that establish the convergence of our algorithms to an approximate stationary point of the DRM objective.
Related Topics
Concepts
Markov decision process
Reinforcement learning
Context (archaeology)
Mathematical optimization
Computer science
Convergence (economics)
Gradient descent
Gradient method
Markov process
Applied mathematics
Mathematics
Artificial intelligence
Statistics
Economics
Artificial neural network
Economic growth
Biology
Paleontology
Metadata
- Type
- preprint
- Language
- en
- Landing Page
- https://arxiv.org/pdf/2107.04422.pdf
- OA Status
- green
- References
- 22
- Related Works
- 20
- OpenAlex ID
- https://openalex.org/W3180692686
All OpenAlex metadata
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W3180692686Canonical identifier for this work in OpenAlex
- Title
-
Likelihood ratio-based policy gradient methods for distorted risk measures: A non-asymptotic analysis.Work title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2021Year of publication
- Publication date
-
2021-07-09Full publication date if available
- Authors
-
Nithia Vijayan, L. A. PrashanthList of authors in order
- Landing page
-
https://arxiv.org/pdf/2107.04422.pdfPublisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2107.04422.pdfDirect OA link when available
- Concepts
-
Markov decision process, Reinforcement learning, Context (archaeology), Mathematical optimization, Computer science, Convergence (economics), Gradient descent, Gradient method, Markov process, Applied mathematics, Mathematics, Artificial intelligence, Statistics, Economics, Artificial neural network, Economic growth, Biology, PaleontologyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- References (count)
-
22Number of works referenced by this work
- Related works (count)
-
20Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W3180692686 |
|---|---|
| doi | |
| ids.mag | 3180692686 |
| ids.openalex | https://openalex.org/W3180692686 |
| fwci | |
| type | preprint |
| title | Likelihood ratio-based policy gradient methods for distorted risk measures: A non-asymptotic analysis. |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10462 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9993000030517578 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Reinforcement Learning in Robotics |
| topics[1].id | https://openalex.org/T12101 |
| topics[1].field.id | https://openalex.org/fields/18 |
| topics[1].field.display_name | Decision Sciences |
| topics[1].score | 0.9883999824523926 |
| topics[1].domain.id | https://openalex.org/domains/2 |
| topics[1].domain.display_name | Social Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1803 |
| topics[1].subfield.display_name | Management Science and Operations Research |
| topics[1].display_name | Advanced Bandit Algorithms Research |
| topics[2].id | https://openalex.org/T10136 |
| topics[2].field.id | https://openalex.org/fields/26 |
| topics[2].field.display_name | Mathematics |
| topics[2].score | 0.9678999781608582 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/2613 |
| topics[2].subfield.display_name | Statistics and Probability |
| topics[2].display_name | Statistical Methods and Inference |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C106189395 |
| concepts[0].level | 3 |
| concepts[0].score | 0.7488395571708679 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q176789 |
| concepts[0].display_name | Markov decision process |
| concepts[1].id | https://openalex.org/C97541855 |
| concepts[1].level | 2 |
| concepts[1].score | 0.6500020027160645 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q830687 |
| concepts[1].display_name | Reinforcement learning |
| concepts[2].id | https://openalex.org/C2779343474 |
| concepts[2].level | 2 |
| concepts[2].score | 0.577132523059845 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q3109175 |
| concepts[2].display_name | Context (archaeology) |
| concepts[3].id | https://openalex.org/C126255220 |
| concepts[3].level | 1 |
| concepts[3].score | 0.5266744494438171 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q141495 |
| concepts[3].display_name | Mathematical optimization |
| concepts[4].id | https://openalex.org/C41008148 |
| concepts[4].level | 0 |
| concepts[4].score | 0.5231003761291504 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[4].display_name | Computer science |
| concepts[5].id | https://openalex.org/C2777303404 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5159664750099182 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q759757 |
| concepts[5].display_name | Convergence (economics) |
| concepts[6].id | https://openalex.org/C153258448 |
| concepts[6].level | 3 |
| concepts[6].score | 0.500464677810669 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q1199743 |
| concepts[6].display_name | Gradient descent |
| concepts[7].id | https://openalex.org/C115680565 |
| concepts[7].level | 2 |
| concepts[7].score | 0.47655656933784485 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q5977448 |
| concepts[7].display_name | Gradient method |
| concepts[8].id | https://openalex.org/C159886148 |
| concepts[8].level | 2 |
| concepts[8].score | 0.4353535771369934 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q176645 |
| concepts[8].display_name | Markov process |
| concepts[9].id | https://openalex.org/C28826006 |
| concepts[9].level | 1 |
| concepts[9].score | 0.38859784603118896 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q33521 |
| concepts[9].display_name | Applied mathematics |
| concepts[10].id | https://openalex.org/C33923547 |
| concepts[10].level | 0 |
| concepts[10].score | 0.3490719199180603 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[10].display_name | Mathematics |
| concepts[11].id | https://openalex.org/C154945302 |
| concepts[11].level | 1 |
| concepts[11].score | 0.16585668921470642 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[11].display_name | Artificial intelligence |
| concepts[12].id | https://openalex.org/C105795698 |
| concepts[12].level | 1 |
| concepts[12].score | 0.12638932466506958 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q12483 |
| concepts[12].display_name | Statistics |
| concepts[13].id | https://openalex.org/C162324750 |
| concepts[13].level | 0 |
| concepts[13].score | 0.10008171200752258 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q8134 |
| concepts[13].display_name | Economics |
| concepts[14].id | https://openalex.org/C50644808 |
| concepts[14].level | 2 |
| concepts[14].score | 0.07562530040740967 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q192776 |
| concepts[14].display_name | Artificial neural network |
| concepts[15].id | https://openalex.org/C50522688 |
| concepts[15].level | 1 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q189833 |
| concepts[15].display_name | Economic growth |
| concepts[16].id | https://openalex.org/C86803240 |
| concepts[16].level | 0 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q420 |
| concepts[16].display_name | Biology |
| concepts[17].id | https://openalex.org/C151730666 |
| concepts[17].level | 1 |
| concepts[17].score | 0.0 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q7205 |
| concepts[17].display_name | Paleontology |
| keywords[0].id | https://openalex.org/keywords/markov-decision-process |
| keywords[0].score | 0.7488395571708679 |
| keywords[0].display_name | Markov decision process |
| keywords[1].id | https://openalex.org/keywords/reinforcement-learning |
| keywords[1].score | 0.6500020027160645 |
| keywords[1].display_name | Reinforcement learning |
| keywords[2].id | https://openalex.org/keywords/context |
| keywords[2].score | 0.577132523059845 |
| keywords[2].display_name | Context (archaeology) |
| keywords[3].id | https://openalex.org/keywords/mathematical-optimization |
| keywords[3].score | 0.5266744494438171 |
| keywords[3].display_name | Mathematical optimization |
| keywords[4].id | https://openalex.org/keywords/computer-science |
| keywords[4].score | 0.5231003761291504 |
| keywords[4].display_name | Computer science |
| keywords[5].id | https://openalex.org/keywords/convergence |
| keywords[5].score | 0.5159664750099182 |
| keywords[5].display_name | Convergence (economics) |
| keywords[6].id | https://openalex.org/keywords/gradient-descent |
| keywords[6].score | 0.500464677810669 |
| keywords[6].display_name | Gradient descent |
| keywords[7].id | https://openalex.org/keywords/gradient-method |
| keywords[7].score | 0.47655656933784485 |
| keywords[7].display_name | Gradient method |
| keywords[8].id | https://openalex.org/keywords/markov-process |
| keywords[8].score | 0.4353535771369934 |
| keywords[8].display_name | Markov process |
| keywords[9].id | https://openalex.org/keywords/applied-mathematics |
| keywords[9].score | 0.38859784603118896 |
| keywords[9].display_name | Applied mathematics |
| keywords[10].id | https://openalex.org/keywords/mathematics |
| keywords[10].score | 0.3490719199180603 |
| keywords[10].display_name | Mathematics |
| keywords[11].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[11].score | 0.16585668921470642 |
| keywords[11].display_name | Artificial intelligence |
| keywords[12].id | https://openalex.org/keywords/statistics |
| keywords[12].score | 0.12638932466506958 |
| keywords[12].display_name | Statistics |
| keywords[13].id | https://openalex.org/keywords/economics |
| keywords[13].score | 0.10008171200752258 |
| keywords[13].display_name | Economics |
| keywords[14].id | https://openalex.org/keywords/artificial-neural-network |
| keywords[14].score | 0.07562530040740967 |
| keywords[14].display_name | Artificial neural network |
| language | en |
| locations[0].id | mag:3180692686 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | |
| locations[0].version | submittedVersion |
| locations[0].raw_type | |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | arXiv (Cornell University) |
| locations[0].landing_page_url | https://arxiv.org/pdf/2107.04422.pdf |
| authorships[0].author.id | https://openalex.org/A5085234899 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-6489-9471 |
| authorships[0].author.display_name | Nithia Vijayan |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Nithia Vijayan |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5068379567 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | L. A. Prashanth |
| authorships[1].author_position | last |
| authorships[1].raw_author_name | L A Prashanth |
| authorships[1].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2107.04422.pdf |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Likelihood ratio-based policy gradient methods for distorted risk measures: A non-asymptotic analysis. |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-10-10T17:16:08.811792 |
| primary_topic.id | https://openalex.org/T10462 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9993000030517578 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Reinforcement Learning in Robotics |
| related_works | https://openalex.org/W3034567339, https://openalex.org/W3038915804, https://openalex.org/W2963457007, https://openalex.org/W2963856199, https://openalex.org/W3127035336, https://openalex.org/W2786042995, https://openalex.org/W3171658869, https://openalex.org/W1532022778, https://openalex.org/W3160598148, https://openalex.org/W2684685482, https://openalex.org/W1545272985, https://openalex.org/W2895049160, https://openalex.org/W3183820536, https://openalex.org/W3080213971, https://openalex.org/W2783932892, https://openalex.org/W2990109857, https://openalex.org/W3036846812, https://openalex.org/W2912617514, https://openalex.org/W2762782919, https://openalex.org/W2783797945 |
| cited_by_count | 0 |
| locations_count | 1 |
| best_oa_location.id | mag:3180692686 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | arXiv (Cornell University) |
| best_oa_location.landing_page_url | https://arxiv.org/pdf/2107.04422.pdf |
| primary_location.id | mag:3180692686 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | |
| primary_location.version | submittedVersion |
| primary_location.raw_type | |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | arXiv (Cornell University) |
| primary_location.landing_page_url | https://arxiv.org/pdf/2107.04422.pdf |
| publication_date | 2021-07-09 |
| publication_year | 2021 |
| referenced_works | https://openalex.org/W2050485049, https://openalex.org/W3135044032, https://openalex.org/W51049863, https://openalex.org/W1576452626, https://openalex.org/W2139914196, https://openalex.org/W1587317356, https://openalex.org/W2121863487, https://openalex.org/W2796289712, https://openalex.org/W3157409643, https://openalex.org/W2964068481, https://openalex.org/W2963470657, https://openalex.org/W2167326844, https://openalex.org/W2945007422, https://openalex.org/W2041946752, https://openalex.org/W2170923204, https://openalex.org/W2963082979, https://openalex.org/W2133626546, https://openalex.org/W2075620513, https://openalex.org/W2019291268, https://openalex.org/W3109546547, https://openalex.org/W2963457007, https://openalex.org/W2962951833 |
| referenced_works_count | 22 |
| abstract_inverted_index.a | 11, 43, 62 |
| abstract_inverted_index.RL | 83 |
| abstract_inverted_index.We | 0, 41, 85 |
| abstract_inverted_index.an | 35, 97 |
| abstract_inverted_index.in | 10, 34, 59, 78 |
| abstract_inverted_index.is | 22 |
| abstract_inverted_index.of | 8, 19, 30, 45, 93, 101 |
| abstract_inverted_index.to | 23, 52, 96 |
| abstract_inverted_index.we | 70 |
| abstract_inverted_index.DRM | 54, 77, 103 |
| abstract_inverted_index.The | 17 |
| abstract_inverted_index.and | 81 |
| abstract_inverted_index.for | 4, 75 |
| abstract_inverted_index.our | 20, 94 |
| abstract_inverted_index.the | 6, 25, 31, 46, 53, 91, 102 |
| abstract_inverted_index.(LR) | 65 |
| abstract_inverted_index.(RL) | 15 |
| abstract_inverted_index.both | 79 |
| abstract_inverted_index.risk | 27 |
| abstract_inverted_index.that | 50, 89 |
| abstract_inverted_index.this | 57 |
| abstract_inverted_index.with | 61 |
| abstract_inverted_index.(DRM) | 29 |
| abstract_inverted_index.Using | 56 |
| abstract_inverted_index.based | 66 |
| abstract_inverted_index.point | 100 |
| abstract_inverted_index.ratio | 64 |
| abstract_inverted_index.(MDP). | 40 |
| abstract_inverted_index.Markov | 37 |
| abstract_inverted_index.bounds | 88 |
| abstract_inverted_index.caters | 51 |
| abstract_inverted_index.derive | 42, 86 |
| abstract_inverted_index.policy | 47, 72 |
| abstract_inverted_index.reward | 33 |
| abstract_inverted_index.control | 9 |
| abstract_inverted_index.measure | 28 |
| abstract_inverted_index.problem | 7 |
| abstract_inverted_index.process | 39 |
| abstract_inverted_index.propose | 1, 71 |
| abstract_inverted_index.scheme, | 69 |
| abstract_inverted_index.solving | 5 |
| abstract_inverted_index.theorem | 49, 58 |
| abstract_inverted_index.variant | 44 |
| abstract_inverted_index.context. | 16 |
| abstract_inverted_index.decision | 38 |
| abstract_inverted_index.episodic | 36 |
| abstract_inverted_index.gradient | 48, 67, 73 |
| abstract_inverted_index.learning | 14 |
| abstract_inverted_index.maximize | 24 |
| abstract_inverted_index.algorithm | 21 |
| abstract_inverted_index.distorted | 26 |
| abstract_inverted_index.establish | 90 |
| abstract_inverted_index.objective | 18 |
| abstract_inverted_index.on-policy | 80 |
| abstract_inverted_index.settings. | 84 |
| abstract_inverted_index.algorithms | 3, 74, 95 |
| abstract_inverted_index.cumulative | 32 |
| abstract_inverted_index.estimation | 68 |
| abstract_inverted_index.likelihood | 63 |
| abstract_inverted_index.objective. | 55, 104 |
| abstract_inverted_index.off-policy | 82 |
| abstract_inverted_index.optimizing | 76 |
| abstract_inverted_index.stationary | 99 |
| abstract_inverted_index.approximate | 98 |
| abstract_inverted_index.conjunction | 60 |
| abstract_inverted_index.convergence | 92 |
| abstract_inverted_index.reinforcement | 13 |
| abstract_inverted_index.non-asymptotic | 87 |
| abstract_inverted_index.risk-sensitive | 12 |
| abstract_inverted_index.policy-gradient | 2 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 2 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/16 |
| sustainable_development_goals[0].score | 0.800000011920929 |
| sustainable_development_goals[0].display_name | Peace, Justice and strong institutions |
| citation_normalized_percentile |