Anna Korba
YOU?
Author Swipe
View article: A Computable Measure of Suboptimality for Entropy-Regularised Variational Objectives
A Computable Measure of Suboptimality for Entropy-Regularised Variational Objectives Open
Several emerging post-Bayesian methods target a probability distribution for which an entropy-regularised variational objective is minimised. This increased flexibility introduces a computational challenge, as one loses access to an explic…
View article: Kernel Trace Distance: Quantum Statistical Metric between Measures through RKHS Density Operators
Kernel Trace Distance: Quantum Statistical Metric between Measures through RKHS Density Operators Open
Distances between probability distributions are a key component of many statistical machine learning tasks, from two-sample testing to generative modeling, among others. We introduce a novel distance between measures that compares them thr…
View article: Variational Inference with Mixtures of Isotropic Gaussians
Variational Inference with Mixtures of Isotropic Gaussians Open
Variational inference (VI) is a popular approach in Bayesian inference, that looks for the best approximation of the posterior distribution within a parametric family, minimizing a loss that is typically the (reverse) Kullback-Leibler (KL)…
View article: Density Ratio Estimation with Conditional Probability Paths
Density Ratio Estimation with Conditional Probability Paths Open
Density ratio estimation in high dimensions can be reframed as integrating a certain quantity, the time score, over probability paths which interpolate between the two densities. In practice, the time score has to be estimated based on sam…
View article: Sampling from multi-modal distributions with polynomial query complexity in fixed dimension via reverse diffusion
Sampling from multi-modal distributions with polynomial query complexity in fixed dimension via reverse diffusion Open
Even in low dimensions, sampling from multi-modal distributions is challenging. We provide the first sampling algorithm for a broad class of distributions -- including all Gaussian mixtures -- with a query complexity that is polynomial in …
View article: Constrained Sampling with Primal-Dual Langevin Monte Carlo
Constrained Sampling with Primal-Dual Langevin Monte Carlo Open
This work considers the problem of sampling from a probability distribution known up to a normalization constant while satisfying a set of statistical constraints specified by the expected values of general nonlinear functions. This proble…
View article: Provable Convergence and Limitations of Geometric Tempering for Langevin Dynamics
Provable Convergence and Limitations of Geometric Tempering for Langevin Dynamics Open
Geometric tempering is a popular approach to sampling from challenging multi-modal probability distributions by instead sampling from a sequence of distributions which interpolate, using the geometric mean, between an easier proposal distr…
View article: (De)-regularized Maximum Mean Discrepancy Gradient Flow
(De)-regularized Maximum Mean Discrepancy Gradient Flow Open
We introduce a (de)-regularization of the Maximum Mean Discrepancy (DrMMD) and its Wasserstein gradient flow. Existing gradient flows that transport samples from source distribution to target distribution with only target samples, either l…
View article: Statistical and Geometrical properties of regularized Kernel Kullback-Leibler divergence
Statistical and Geometrical properties of regularized Kernel Kullback-Leibler divergence Open
In this paper, we study the statistical and geometrical properties of the Kullback-Leibler divergence with kernel covariance operators (KKL) introduced by Bach [2022]. Unlike the classical Kullback-Leibler (KL) divergence that involves den…
View article: A Practical Diffusion Path for Sampling
A Practical Diffusion Path for Sampling Open
Diffusion models are state-of-the-art methods in generative modeling when samples from a target probability distribution are available, and can be efficiently sampled, using score matching to estimate score vectors guiding a Langevin proce…
View article: Theoretical Guarantees for Variational Inference with Fixed-Variance Mixture of Gaussians
Theoretical Guarantees for Variational Inference with Fixed-Variance Mixture of Gaussians Open
Variational inference (VI) is a popular approach in Bayesian inference, that looks for the best approximation of the posterior distribution within a parametric family, minimizing a loss that is typically the (reverse) Kullback-Leibler (KL)…
View article: Unified PAC-Bayesian Study of Pessimism for Offline Policy Learning with Regularized Importance Sampling
Unified PAC-Bayesian Study of Pessimism for Offline Policy Learning with Regularized Importance Sampling Open
Off-policy learning (OPL) often involves minimizing a risk estimator based on importance weighting to correct bias from the logging policy used to collect data. However, this method can produce an estimator with a high variance. A common s…
View article: Bayesian Off-Policy Evaluation and Learning for Large Action Spaces
Bayesian Off-Policy Evaluation and Learning for Large Action Spaces Open
In interactive systems, actions are often correlated, presenting an opportunity for more sample-efficient off-policy evaluation (OPE) and learning (OPL) in large action spaces. We introduce a unified Bayesian framework to capture these cor…
View article: Implicit Diffusion: Efficient Optimization through Stochastic Sampling
Implicit Diffusion: Efficient Optimization through Stochastic Sampling Open
We present a new algorithm to optimize distributions defined implicitly by parameterized stochastic diffusions. Doing so allows us to modify the outcome distribution of sampling processes by optimizing over their parameters. We introduce a…
View article: Mirror and Preconditioned Gradient Descent in Wasserstein Space
Mirror and Preconditioned Gradient Descent in Wasserstein Space Open
As the problem of minimizing functionals on the Wasserstein space encompasses many applications in machine learning, different optimization algorithms on $\mathbb{R}^d$ have received their counterpart analog on the Wasserstein space. We fo…
View article: A connection between Tempering and Entropic Mirror Descent
A connection between Tempering and Entropic Mirror Descent Open
This paper explores the connections between tempering (for Sequential Monte Carlo; SMC) and entropic mirror descent to sample from a target probability distribution whose unnormalized density is known. We establish that tempering SMC corre…
View article: Exponential Smoothing for Off-Policy Learning
Exponential Smoothing for Off-Policy Learning Open
Off-policy learning (OPL) aims at finding improved policies from logged bandit data, often by minimizing the inverse propensity scoring (IPS) estimator of the risk. In this work, we investigate a smooth regularization for IPS, for which we…
View article: Sampling with Mollified Interaction Energy Descent
Sampling with Mollified Interaction Energy Descent Open
Sampling from a target measure whose density is only known up to a normalization constant is a fundamental problem in computational statistics and machine learning. In this paper, we present a new optimization-based method for sampling cal…
View article: Variational Inference of overparameterized Bayesian Neural Networks: a theoretical and empirical study
Variational Inference of overparameterized Bayesian Neural Networks: a theoretical and empirical study Open
This paper studies the Variational Inference (VI) used for training Bayesian Neural Networks (BNN) in the overparameterized regime, i.e., when the number of neurons tends to infinity. More specifically, we consider overparameterized two-la…
View article: Mirror Descent with Relative Smoothness in Measure Spaces, with application to Sinkhorn and EM
Mirror Descent with Relative Smoothness in Measure Spaces, with application to Sinkhorn and EM Open
Many problems in machine learning can be formulated as optimizing a convex functional over a vector space of measures. This paper studies the convergence of the mirror descent algorithm in this infinite-dimensional setting. Defining Bregma…
View article: Adaptive Importance Sampling meets Mirror Descent: a Bias-variance tradeoff
Adaptive Importance Sampling meets Mirror Descent: a Bias-variance tradeoff Open
Adaptive importance sampling is a widely spread Monte Carlo technique that uses a re-weighting strategy to iteratively estimate the so-called target distribution. A major drawback of adaptive importance sampling is the large variance of th…
View article: Kernel Stein Discrepancy Descent
Kernel Stein Discrepancy Descent Open
Among dissimilarities between probability distributions, the Kernel Stein Discrepancy (KSD) has received much interest recently. We investigate the properties of its Wasserstein gradient flow to approximate a target probability distributio…
View article: Proximal Causal Learning with Kernels: Two-Stage Estimation and Moment Restriction
Proximal Causal Learning with Kernels: Two-Stage Estimation and Moment Restriction Open
We address the problem of causal effect estimation in the presence of unobserved confounding, but where proxies for the latent confounder(s) are observed. We propose two kernel-based methods for nonlinear causal effect estimation in this s…
View article: A Non-Asymptotic Analysis for Stein Variational Gradient Descent
A Non-Asymptotic Analysis for Stein Variational Gradient Descent Open
We study the Stein Variational Gradient Descent (SVGD) algorithm, which optimises a set of particles to approximate a target probability distribution $π\propto e^{-V}$ on $\mathbb{R}^d$. In the population limit, SVGD performs gradient desc…
View article: Wasserstein Proximal Gradient
Wasserstein Proximal Gradient Open
We consider the task of sampling from a log-concave probability distribution. This target distribution can be seen as a minimizer of the relative entropy functional defined on the space of probability distributions. The relative entropy ca…
View article: The Wasserstein Proximal Gradient Algorithm
The Wasserstein Proximal Gradient Algorithm Open
Wasserstein gradient flows are continuous time dynamics that define curves of steepest descent to minimize an objective function over the space of probability measures (i.e., the Wasserstein space). This objective is typically a divergence…
View article: A Non-Asymptotic Analysis for Stein Variational Gradient Descent
A Non-Asymptotic Analysis for Stein Variational Gradient Descent Open
We study the Stein Variational Gradient Descent (SVGD) algorithm, which optimises a set of particles to approximate a target probability distribution $\\pi\\propto e^{-V}$ on $\\mathbb{R}^d$. In the population limit, SVGD performs gradient…
View article: Maximum Mean Discrepancy Gradient Flow
Maximum Mean Discrepancy Gradient Flow Open
We construct a Wasserstein gradient flow of the maximum mean discrepancy (MMD) and study its convergence properties. The MMD is an integral probability metric defined for a reproducing kernel Hilbert space (RKHS), and serves as a metric on…
View article: Maximum Mean Discrepancy Gradient Flow
Maximum Mean Discrepancy Gradient Flow Open
We construct a Wasserstein gradient flow of the maximum mean discrepancy (MMD) and study its convergence properties. The MMD is an integral probability metric defined for a reproducing kernel Hilbert space (RKHS), and serves as a metric on…
View article: A Structured Prediction Approach for Label Ranking
A Structured Prediction Approach for Label Ranking Open
International audience