L. A. Prashanth
YOU?
Author Swipe
View article: Learning to optimize convex risk measures: The cases of utility-based shortfall risk and optimized certainty equivalent risk
Learning to optimize convex risk measures: The cases of utility-based shortfall risk and optimized certainty equivalent risk Open
We consider the problems of estimation and optimization of two popular convex risk measures: utility-based shortfall risk (UBSR) and Optimized Certainty Equivalent (OCE) risk. We extend these risk measures to cover possibly unbounded rando…
View article: Risk-sensitive Bandits: Arm Mixture Optimality and Regret-efficient Algorithms
Risk-sensitive Bandits: Arm Mixture Optimality and Regret-efficient Algorithms Open
This paper introduces a general framework for risk-sensitive bandits that integrates the notions of risk-sensitive objectives by adopting a rich class of distortion riskmetrics. The introduced framework subsumes the various existing risk-s…
View article: Markov Chain Variance Estimation: A Stochastic Approximation Approach
Markov Chain Variance Estimation: A Stochastic Approximation Approach Open
We consider the problem of estimating the asymptotic variance of a function defined on a Markov chain, an important step for statistical inference of the stationary mean. We design a novel recursive estimator that requires $O(1)$ computati…
View article: A Finite-Sample Analysis of an Actor-Critic Algorithm for Mean-Variance Optimization in a Discounted MDP
A Finite-Sample Analysis of an Actor-Critic Algorithm for Mean-Variance Optimization in a Discounted MDP Open
Motivated by applications in risk-sensitive reinforcement learning, we study mean-variance optimization in a discounted reward Markov Decision Process (MDP). Specifically, we analyze a Temporal Difference (TD) learning algorithm with linea…
View article: Concentration Bounds for Optimized Certainty Equivalent Risk Estimation
Concentration Bounds for Optimized Certainty Equivalent Risk Estimation Open
We consider the problem of estimating the Optimized Certainty Equivalent (OCE) risk from independent and identically distributed (i.i.d.) samples. For the classic sample average approximation (SAA) of OCE, we derive mean-squared error as w…
View article: Optimization of utility-based shortfall risk: A non-asymptotic viewpoint
Optimization of utility-based shortfall risk: A non-asymptotic viewpoint Open
We consider the problems of estimation and optimization of utility-based shortfall risk (UBSR), which is a popular risk measure in finance. In the context of UBSR estimation, we derive a non-asymptotic bound on the mean-squared error of th…
View article: Risk Estimation in a Markov Cost Process: Lower and Upper Bounds
Risk Estimation in a Markov Cost Process: Lower and Upper Bounds Open
We tackle the problem of estimating risk measures of the infinite-horizon discounted cost within a Markov cost process. The risk measures we study include variance, Value-at-Risk (VaR), and Conditional Value-at-Risk (CVaR). First, we show …
View article: A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning
A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning Open
We consider the problem of control in the setting of reinforcement learning (RL), where model information is not available. Policy gradient algorithms are a popular solution approach for this problem and are usually shown to converge to a …
View article: Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation
Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation Open
We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm when combined with tail-averaging. We derive finite time bounds on the parameter error of the tail-averaged TD iterate under a step-size choice t…
View article: A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization
A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization Open
In this paper, we present a stochastic gradient algorithm for minimizing a smooth objective function that is an expectation over noisy cost samples, and only the latter are observed for any given parameter. Our algorithm employs a gradient…
View article: A Survey of Risk-Aware Multi-Armed Bandits
A Survey of Risk-Aware Multi-Armed Bandits Open
In several applications such as clinical trials and financial portfolio optimization, the expected value (or the average reward) does not satisfactorily capture the merits of a drug or a portfolio. In such applications, risk plays a crucia…
View article: A Survey of Risk-Aware Multi-Armed Bandits
A Survey of Risk-Aware Multi-Armed Bandits Open
In several applications such as clinical trials and financial portfolio optimization, the expected value (or the average reward) does not satisfactorily capture the merits of a drug or a portfolio. In such applications, risk plays a crucia…
View article: Minimum mean-squared error estimation with bandit feedback
Minimum mean-squared error estimation with bandit feedback Open
We consider the problem of sequentially learning to estimate, in the mean squared error (MSE) sense, a Gaussian $K$-vector of unknown covariance by observing only $m < K$ of its entries in each round. We propose two MSE estimators, and ana…
View article: Online Estimation and Optimization of Utility-Based Shortfall Risk.
Online Estimation and Optimization of Utility-Based Shortfall Risk. Open
Utility-Based Shortfall Risk (UBSR) is a risk metric that is increasingly popular in financial applications, owing to certain desirable properties that it enjoys. We consider the problem of estimating UBSR in a recursive setting, where sam…
View article: Online Estimation and Optimization of Utility-Based Shortfall Risk
Online Estimation and Optimization of Utility-Based Shortfall Risk Open
Utility-Based Shortfall Risk (UBSR) is a risk metric that is increasingly popular in financial applications, owing to certain desirable properties that it enjoys. We consider the problem of estimating UBSR in a recursive setting, where sam…
View article: Likelihood ratio-based policy gradient methods for distorted risk measures: A non-asymptotic analysis.
Likelihood ratio-based policy gradient methods for distorted risk measures: A non-asymptotic analysis. Open
We propose policy-gradient algorithms for solving the problem of control in a risk-sensitive reinforcement learning (RL) context. The objective of our algorithm is to maximize the distorted risk measure (DRM) of the cumulative reward in an…
View article: Estimation of Spectral Risk Measures
Estimation of Spectral Risk Measures Open
We consider the problem of estimating a spectral risk measure (SRM) from i.i.d. samples, and propose a novel method that is based on numerical integration. We show that our SRM estimate concentrates exponentially, when the underlying distr…
View article: Smoothed functional-based gradient algorithms for off-policy reinforcement learning.
Smoothed functional-based gradient algorithms for off-policy reinforcement learning. Open
We consider the problem of control in an off-policy reinforcement learning (RL) context. We propose a policy gradient scheme that incorporates a smoothed functional-based gradient estimation scheme. We provide an asymptotic convergence gua…
View article: Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling
Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling Open
View article: Concentration of risk measures: A Wasserstein distance approach
Concentration of risk measures: A Wasserstein distance approach Open
This paper presents a unified approach based on Wasserstein distance to derive concentration bounds for empirical estimates for a broad class of risk measures. The results cover two broad classes of risk measures which are defined in the p…
View article: Improved Concentration Bounds for Conditional Value-at-Risk and Cumulative Prospect Theory using Wasserstein distance.
Improved Concentration Bounds for Conditional Value-at-Risk and Cumulative Prospect Theory using Wasserstein distance. Open
View article: A Wasserstein distance approach for concentration of empirical risk estimates
A Wasserstein distance approach for concentration of empirical risk estimates Open
This paper presents a unified approach based on Wasserstein distance to derive concentration bounds for empirical estimates for two broad classes of risk measures defined in the paper. The classes of risk measures introduced include as spe…
View article: Correlated bandits or: How to minimize mean-squared error online
Correlated bandits or: How to minimize mean-squared error online Open
While the objective in traditional multi-armed bandit problems is to find the arm with the highest mean, in many settings, finding an arm that best captures information about other arms is of interest. This objective, however, requires lea…
View article: Concentration bounds for CVaR estimation: The cases of light-tailed and heavy-tailed distributions
Concentration bounds for CVaR estimation: The cases of light-tailed and heavy-tailed distributions Open
Conditional Value-at-Risk (CVaR) is a widely used risk metric in applications such as finance. We derive concentration bounds for CVaR estimates, considering separately the cases of light-tailed and heavy-tailed distributions. In the light…
View article: Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint.
Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint. Open
The classic objective in a reinforcement learning (RL) problem is to find a policy that minimizes, in expectation, a long-run objective such as the infinite-horizon discounted or long-run average cost. In many practical applications, optim…
View article: Risk-Sensitive Reinforcement Learning via Policy Gradient Search
Risk-Sensitive Reinforcement Learning via Policy Gradient Search Open
The objective in a traditional reinforcement learning (RL) problem is to find a policy that optimizes the expected value of a performance metric such as the infinite-horizon cumulative discounted or long-run average cost/reward. In practic…
View article: Random directions stochastic approximation with deterministic perturbations
Random directions stochastic approximation with deterministic perturbations Open
We introduce deterministic perturbation schemes for the recently proposed random directions stochastic approximation (RDSA) [17], and propose new first-order and second-order algorithms. In the latter case, these are the first second-order…
View article: Concentration bounds for empirical conditional value-at-risk: The\n unbounded case
Concentration bounds for empirical conditional value-at-risk: The\n unbounded case Open
In several real-world applications involving decision making under\nuncertainty, the traditional expected value objective may not be suitable, as\nit may be necessary to control losses in the case of a rare but extreme event.\nConditional …
View article: Concentration bounds for empirical conditional value-at-risk: The unbounded case
Concentration bounds for empirical conditional value-at-risk: The unbounded case Open
In several real-world applications involving decision making under uncertainty, the traditional expected value objective may not be suitable, as it may be necessary to control losses in the case of a rare but extreme event. Conditional Val…
View article: Weighted Bandits or: How Bandits Learn Distorted Values That Are Not Expected
Weighted Bandits or: How Bandits Learn Distorted Values That Are Not Expected Open
Motivated by models of human decision making proposed to explain commonly observed deviations from conventional expected value preferences, we formulate two stochastic multi-armed bandit problems with distorted probabilities on the cost di…