Explanipedia

Learning to optimize convex risk measures: The cases of utility-based shortfall risk and optimized certainty equivalent risk Open

Sumedh Gupte, L. A. Prashanth, Sanjay P. Bhat · 2025

We consider the problems of estimation and optimization of two popular convex risk measures: utility-based shortfall risk (UBSR) and Optimized Certainty Equivalent (OCE) risk. We extend these risk measures to cover possibly unbounded rando…

Risk-sensitive Bandits: Arm Mixture Optimality and Regret-efficient Algorithms Open

Meltem Tatlı, Arpan Mukherjee, L. A. Prashanth, Karthikeyan Shanmugam, Ali Tajer · 2025

This paper introduces a general framework for risk-sensitive bandits that integrates the notions of risk-sensitive objectives by adopting a rich class of distortion riskmetrics. The introduced framework subsumes the various existing risk-s…

Markov Chain Variance Estimation: A Stochastic Approximation Approach Open

Shubhada Agrawal, L. A. Prashanth, Siva Theja Maguluri · 2024

We consider the problem of estimating the asymptotic variance of a function defined on a Markov chain, an important step for statistical inference of the stationary mean. We design a novel recursive estimator that requires $O(1)$ computati…

A Finite-Sample Analysis of an Actor-Critic Algorithm for Mean-Variance Optimization in a Discounted MDP Open

Tejaram Sangadi, L. A. Prashanth, Krishna Jagannathan · 2024

Motivated by applications in risk-sensitive reinforcement learning, we study mean-variance optimization in a discounted reward Markov Decision Process (MDP). Specifically, we analyze a Temporal Difference (TD) learning algorithm with linea…

Concentration Bounds for Optimized Certainty Equivalent Risk Estimation Open

Ayon Ghosh, L. A. Prashanth, Krishna Jagannathan · 2024

We consider the problem of estimating the Optimized Certainty Equivalent (OCE) risk from independent and identically distributed (i.i.d.) samples. For the classic sample average approximation (SAA) of OCE, we derive mean-squared error as w…

Optimization of utility-based shortfall risk: A non-asymptotic viewpoint Open

Sumedh Gupte, L. A. Prashanth, Sanjay P. Bhat · 2023

We consider the problems of estimation and optimization of utility-based shortfall risk (UBSR), which is a popular risk measure in finance. In the context of UBSR estimation, we derive a non-asymptotic bound on the mean-squared error of th…

Risk Estimation in a Markov Cost Process: Lower and Upper Bounds Open

Sanjay P. Bhat, L. A. Prashanth, Gugan Thoppe · 2023

We tackle the problem of estimating risk measures of the infinite-horizon discounted cost within a Markov cost process. The risk measures we study include variance, Value-at-Risk (VaR), and Conditional Value-at-Risk (CVaR). First, we show …

A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning Open

Mizhaan Prajit Maniyar, Akash Mondal, L. A. Prashanth, Shalabh Bhatnagar · 2023

We consider the problem of control in the setting of reinforcement learning (RL), where model information is not available. Policy gradient algorithms are a popular solution approach for this problem and are usually shown to converge to a …

Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation Open

Gandharv Patil, L. A. Prashanth, Anant Raj, Doina Precup · 2022

We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm when combined with tail-averaging. We derive finite time bounds on the parameter error of the tail-averaged TD iterate under a step-size choice t…

A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization Open

Akash Mondal, L. A. Prashanth, Shalabh Bhatnagar · 2022

In this paper, we present a stochastic gradient algorithm for minimizing a smooth objective function that is an expectation over noisy cost samples, and only the latter are observed for any given parameter. Our algorithm employs a gradient…

A Survey of Risk-Aware Multi-Armed Bandits Open

Vincent Y. F. Tan, L. A. Prashanth, Krishna Jagannathan · 2022

In several applications such as clinical trials and financial portfolio optimization, the expected value (or the average reward) does not satisfactorily capture the merits of a drug or a portfolio. In such applications, risk plays a crucia…

A Survey of Risk-Aware Multi-Armed Bandits Open

Vincent Y. F. Tan, L. A. Prashanth, Krishna Jagannathan · 2022

In several applications such as clinical trials and financial portfolio optimization, the expected value (or the average reward) does not satisfactorily capture the merits of a drug or a portfolio. In such applications, risk plays a crucia…

Minimum mean-squared error estimation with bandit feedback Open

D. P. Sen, L. A. Prashanth, Aditya Gopalan · 2022

We consider the problem of sequentially learning to estimate, in the mean squared error (MSE) sense, a Gaussian $K$-vector of unknown covariance by observing only $m < K$ of its entries in each round. We propose two MSE estimators, and ana…

Online Estimation and Optimization of Utility-Based Shortfall Risk. Open

Arvind S. Menon, L. A. Prashanth, Krishna Jagannathan · 2021

Utility-Based Shortfall Risk (UBSR) is a risk metric that is increasingly popular in financial applications, owing to certain desirable properties that it enjoys. We consider the problem of estimating UBSR in a recursive setting, where sam…

Online Estimation and Optimization of Utility-Based Shortfall Risk Open

Arvind S. Menon, L. A. Prashanth, Krishna Jagannathan · 2021

Utility-Based Shortfall Risk (UBSR) is a risk metric that is increasingly popular in financial applications, owing to certain desirable properties that it enjoys. We consider the problem of estimating UBSR in a recursive setting, where sam…

Likelihood ratio-based policy gradient methods for distorted risk measures: A non-asymptotic analysis. Open

Nithia Vijayan, L. A. Prashanth · 2021

We propose policy-gradient algorithms for solving the problem of control in a risk-sensitive reinforcement learning (RL) context. The objective of our algorithm is to maximize the distorted risk measure (DRM) of the cumulative reward in an…

Estimation of Spectral Risk Measures Open

Ajay Kumar Pandey, L. A. Prashanth, Sanjay P. Bhat · 2021

We consider the problem of estimating a spectral risk measure (SRM) from i.i.d. samples, and propose a novel method that is based on numerical integration. We show that our SRM estimate concentrates exponentially, when the underlying distr…

Smoothed functional-based gradient algorithms for off-policy reinforcement learning. Open

Nithia Vijayan, L. A. Prashanth · 2021

We consider the problem of control in an off-policy reinforcement learning (RL) context. We propose a policy gradient scheme that incorporates a smoothed functional-based gradient estimation scheme. We provide an asymptotic convergence gua…

Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling Open

L. A. Prashanth, Nathaniel Korda, Rémi Munos · 2021

Concentration of risk measures: A Wasserstein distance approach Open

L. A. Prashanth, Sanjay P. Bhat · 2019

This paper presents a unified approach based on Wasserstein distance to derive concentration bounds for empirical estimates for a broad class of risk measures. The results cover two broad classes of risk measures which are defined in the p…

Improved Concentration Bounds for Conditional Value-at-Risk and Cumulative Prospect Theory using Wasserstein distance. Open

Sanjay P. Bhat, L. A. Prashanth · 2019

A Wasserstein distance approach for concentration of empirical risk estimates Open

L. A. Prashanth, Sanjay P. Bhat · 2019

This paper presents a unified approach based on Wasserstein distance to derive concentration bounds for empirical estimates for two broad classes of risk measures defined in the paper. The classes of risk measures introduced include as spe…

Correlated bandits or: How to minimize mean-squared error online Open

Vinay Praneeth Boda, L. A. Prashanth · 2019

While the objective in traditional multi-armed bandit problems is to find the arm with the highest mean, in many settings, finding an arm that best captures information about other arms is of interest. This objective, however, requires lea…

Concentration bounds for CVaR estimation: The cases of light-tailed and heavy-tailed distributions Open

L. A. Prashanth, Krishna Jagannathan, Ravi Kumar Kolla · 2019

Conditional Value-at-Risk (CVaR) is a widely used risk metric in applications such as finance. We derive concentration bounds for CVaR estimates, considering separately the cases of light-tailed and heavy-tailed distributions. In the light…

Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint. Open

L. A. Prashanth, Michael C. Fu · 2018

The classic objective in a reinforcement learning (RL) problem is to find a policy that minimizes, in expectation, a long-run objective such as the infinite-horizon discounted or long-run average cost. In many practical applications, optim…

Risk-Sensitive Reinforcement Learning via Policy Gradient Search Open

L. A. Prashanth, Michael C. Fu · 2018

The objective in a traditional reinforcement learning (RL) problem is to find a policy that optimizes the expected value of a performance metric such as the infinite-horizon cumulative discounted or long-run average cost/reward. In practic…

Random directions stochastic approximation with deterministic perturbations Open

L. A. Prashanth, Shalabh Bhatnagar, Nirav Bhavsar, Michael C. Fu, Steven I. Marcus · 2018

We introduce deterministic perturbation schemes for the recently proposed random directions stochastic approximation (RDSA) [17], and propose new first-order and second-order algorithms. In the latter case, these are the first second-order…

Concentration bounds for empirical conditional value-at-risk: The\n unbounded case Open

Ravi Kumar Kolla, L. A. Prashanth, Sanjay P. Bhat, Krishna Jagannathan · 2018

In several real-world applications involving decision making under\nuncertainty, the traditional expected value objective may not be suitable, as\nit may be necessary to control losses in the case of a rare but extreme event.\nConditional …

Concentration bounds for empirical conditional value-at-risk: The unbounded case Open

Ravi Kumar Kolla, L. A. Prashanth, Sanjay P. Bhat, Krishna Jagannathan · 2018

In several real-world applications involving decision making under uncertainty, the traditional expected value objective may not be suitable, as it may be necessary to control losses in the case of a rare but extreme event. Conditional Val…

Weighted Bandits or: How Bandits Learn Distorted Values That Are Not Expected Open

Aditya Gopalan, L. A. Prashanth, Michael C. Fu, Steven I. Marcus · 2017

Motivated by models of human decision making proposed to explain commonly observed deviations from conventional expected value preferences, we formulate two stochastic multi-armed bandit problems with distorted probabilities on the cost di…

L. A. Prashanth YOU? Author Swipe