Mert Gürbüzbalaban
YOU?
Author Swipe
View article: Accelerated Gradient Methods with Biased Gradient Estimates: Risk Sensitivity, High-Probability Guarantees, and Large Deviation Bounds
Accelerated Gradient Methods with Biased Gradient Estimates: Risk Sensitivity, High-Probability Guarantees, and Large Deviation Bounds Open
We study trade-offs between convergence rate and robustness to gradient errors in the context of first-order methods. Our focus is on generalized momentum methods (GMMs)--a broad class that includes Nesterov's accelerated gradient, heavy-b…
View article: Mean–semideviation–based distributionally robust learning with weakly convex losses: convergence rates and finite-sample guarantees
Mean–semideviation–based distributionally robust learning with weakly convex losses: convergence rates and finite-sample guarantees Open
We consider a distributionally robust stochastic optimization problem where the ambiguity sets are implicitly defined by the dual representation of the mean–semideviation risk measure. Utilizing the specific form of this risk measure, we r…
View article: RESIST: Resilient Decentralized Learning Using Consensus Gradient Descent
RESIST: Resilient Decentralized Learning Using Consensus Gradient Descent Open
Empirical risk minimization (ERM) is a cornerstone of modern machine learning (ML), supported by advances in optimization theory that ensure efficient solutions with provable algorithmic convergence rates, which measure the speed at which …
View article: Algorithmic Stability of Stochastic Gradient Descent with Momentum under Heavy-Tailed Noise
Algorithmic Stability of Stochastic Gradient Descent with Momentum under Heavy-Tailed Noise Open
Understanding the generalization properties of optimization algorithms under heavy-tailed noise has gained growing attention. However, the existing theoretical results mainly focus on stochastic gradient descent (SGD) and the analysis of h…
View article: Generalized EXTRA stochastic gradient Langevin dynamics
Generalized EXTRA stochastic gradient Langevin dynamics Open
Langevin algorithms are popular Markov Chain Monte Carlo methods for Bayesian learning, particularly when the aim is to sample from the posterior distribution of a parametric model, given the input data and the prior distribution over the …
View article: High-probability complexity guarantees for nonconvex minimax problems
High-probability complexity guarantees for nonconvex minimax problems Open
Stochastic smooth nonconvex minimax problems are prevalent in machine learning, e.g., GAN training, fair classification, and distributionally robust learning. Stochastic gradient descent ascent (GDA)-type methods are popular in practice du…
View article: A Stochastic GDA Method With Backtracking For Solving Nonconvex (Strongly) Concave Minimax Problems
A Stochastic GDA Method With Backtracking For Solving Nonconvex (Strongly) Concave Minimax Problems Open
We propose a stochastic GDA (gradient descent ascent) method with backtracking (SGDA-B) to solve nonconvex-(strongly) concave (NCC) minimax problems $\min_x \max_y \sum_{i=1}^N g_i(x_i)+f(x,y)-h(y)$, where $h$ and $g_i$ for $i = 1, \ldots,…
View article: Privacy of SGD under Gaussian or Heavy-Tailed Noise: Guarantees without Gradient Clipping
Privacy of SGD under Gaussian or Heavy-Tailed Noise: Guarantees without Gradient Clipping Open
The injection of heavy-tailed noise into the iterates of stochastic gradient descent (SGD) has garnered growing interest in recent years due to its theoretical and empirical benefits for optimization and generalization. However, its implic…
View article: Robustly Stable Accelerated Momentum Methods With A Near-Optimal L2 Gain and $H_\infty$ Performance
Robustly Stable Accelerated Momentum Methods With A Near-Optimal L2 Gain and $H_\infty$ Performance Open
We consider the problem of minimizing a strongly convex smooth function where the gradients are subject to additive worst-case deterministic errors that are square-summable. We study the trade-offs between the convergence rate and robustne…
View article: Accelerated Gradient Methods for Nonconvex Optimization: Escape Trajectories From Strict Saddle Points and Convergence to Local Minima
Accelerated Gradient Methods for Nonconvex Optimization: Escape Trajectories From Strict Saddle Points and Convergence to Local Minima Open
This paper considers the problem of understanding the behavior of a general class of accelerated gradient methods on smooth nonconvex functions. Motivated by some recent works that have proposed effective algorithms, based on Polyak's heav…
View article: Uniform-in-Time Wasserstein Stability Bounds for (Noisy) Stochastic Gradient Descent
Uniform-in-Time Wasserstein Stability Bounds for (Noisy) Stochastic Gradient Descent Open
Algorithmic stability is an important notion that has proven powerful for deriving generalization bounds for practical algorithms. The last decade has witnessed an increasing number of stability bounds for different algorithms applied on d…
View article: High Probability and Risk-Averse Guarantees for a Stochastic Accelerated Primal-Dual Method
High Probability and Risk-Averse Guarantees for a Stochastic Accelerated Primal-Dual Method Open
We consider stochastic strongly-convex-strongly-concave (SCSC) saddle point (SP) problems which frequently arise in applications ranging from distributionally robust learning to game theory and fairness in machine learning. We focus on the…
View article: Cyclic and Randomized Stepsizes Invoke Heavier Tails in SGD than Constant Stepsize
Cyclic and Randomized Stepsizes Invoke Heavier Tails in SGD than Constant Stepsize Open
Cyclic and randomized stepsizes are widely used in the deep learning practice and can often outperform standard stepsize choices such as constant stepsize in SGD. Despite their empirical success, not much is currently known about when and …
View article: Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions
Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions Open
Heavy-tail phenomena in stochastic gradient descent (SGD) have been reported in several empirical studies. Experimental evidence in previous works suggests a strong interplay between the heaviness of the tails and generalization behavior o…
View article: Distributionally Robust Learning with Weakly Convex Losses: Convergence Rates and Finite-Sample Guarantees
Distributionally Robust Learning with Weakly Convex Losses: Convergence Rates and Finite-Sample Guarantees Open
We consider a distributionally robust stochastic optimization problem and formulate it as a stochastic two-level composition optimization problem with the use of the mean--semideviation risk measure. In this setting, we consider a single t…
View article: Penalized Overdamped and Underdamped Langevin Monte Carlo Algorithms for Constrained Sampling
Penalized Overdamped and Underdamped Langevin Monte Carlo Algorithms for Constrained Sampling Open
We consider the constrained sampling problem where the goal is to sample from a target distribution $π(x)\propto e^{-f(x)}$ when $x$ is constrained to lie on a convex body $\mathcal{C}$. Motivated by penalty methods from continuous optimiz…
View article: Exit Time Analysis for Approximations of Gradient Descent Trajectories Around Saddle Points
Exit Time Analysis for Approximations of Gradient Descent Trajectories Around Saddle Points Open
This paper considers the problem of understanding the exit time for trajectories of gradient-related first-order methods from saddle neighborhoods under some initial boundary conditions. Given the ‘flat’ geometry around saddle points, firs…
View article: Algorithmic Stability of Heavy-Tailed Stochastic Gradient Descent on Least Squares
Algorithmic Stability of Heavy-Tailed Stochastic Gradient Descent on Least Squares Open
Recent studies have shown that heavy tails can emerge in stochastic optimization and that the heaviness of the tails have links to the generalization error. While these studies have shed light on interesting aspects of the generalization b…
View article: SAPD+: An Accelerated Stochastic Method for Nonconvex-Concave Minimax Problems
SAPD+: An Accelerated Stochastic Method for Nonconvex-Concave Minimax Problems Open
We propose a new stochastic method SAPD+ for solving nonconvex-concave minimax problems of the form $\min\max\mathcal{L}(x,y)=f(x)+Φ(x,y)-g(y)$, where $f,g$ are closed convex and $Φ(x,y)$ is a smooth function that is weakly convex in $x$, …
View article: Heavy-Tail Phenomenon in Decentralized SGD
Heavy-Tail Phenomenon in Decentralized SGD Open
Recent theoretical studies have shown that heavy-tails can emerge in stochastic optimization due to `multiplicative noise', even under surprisingly simple settings, such as linear regression with Gaussian data. While these studies have unc…
View article: Differentially Private Accelerated Optimization Algorithms
Differentially Private Accelerated Optimization Algorithms Open
We present two classes of differentially private optimization algorithms derived from the well-known accelerated first-order methods. The first algorithm is inspired by Polyak's heavy ball method and employs a smoothing approach to decreas…
View article: Entropic Risk-Averse Generalized Momentum Methods
Entropic Risk-Averse Generalized Momentum Methods Open
In the context of first-order algorithms subject to random gradient noise, we study the trade-offs between the convergence rate (which quantifies how fast the initial conditions are forgotten) and the "risk" of suboptimality, i.e. deviatio…
View article: A Variance-Reduced Stochastic Accelerated Primal Dual Algorithm
A Variance-Reduced Stochastic Accelerated Primal Dual Algorithm Open
In this work, we consider strongly convex strongly concave (SCSC) saddle point (SP) problems $\min_{x\in\mathbb{R}^{d_x}}\max_{y\in\mathbb{R}^{d_y}}f(x,y)$ where $f$ is $L$-smooth, $f(.,y)$ is $μ$-strongly convex for every $y$, and $f(x,.)…
View article: Robust Accelerated Primal-Dual Methods for Computing Saddle Points
Robust Accelerated Primal-Dual Methods for Computing Saddle Points Open
We consider strongly-convex-strongly-concave saddle point problems assuming we have access to unbiased stochastic estimates of the gradients. We propose a stochastic accelerated primal-dual (SAPD) algorithm and show that SAPD sequence, gen…
View article: L-DQN: An Asynchronous Limited-Memory Distributed Quasi-Newton Method
L-DQN: An Asynchronous Limited-Memory Distributed Quasi-Newton Method Open
This work proposes a distributed algorithm for solving empirical risk minimization problems, called L-DQN, under the master/worker communication model. L-DQN is a distributed limited-memory quasi-Newton method that supports asynchronous co…
View article: Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms
Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms Open
Understanding generalization in deep learning has been one of the major challenges in statistical learning theory over the last decade. While recent work has illustrated that the dataset and the training algorithm must be taken into accoun…
View article: TENGraD: Time-Efficient Natural Gradient Descent with Exact Fisher-Block Inversion
TENGraD: Time-Efficient Natural Gradient Descent with Exact Fisher-Block Inversion Open
This work proposes a time-efficient Natural Gradient Descent method, called TENGraD, with linear convergence guarantees. Computing the inverse of the neural network's Fisher information matrix is expensive in NGD because the Fisher matrix …
View article: TENGraD: Time-Efficient Natural Gradient Descent with Exact Fisher-Block\n Inversion
TENGraD: Time-Efficient Natural Gradient Descent with Exact Fisher-Block\n Inversion Open
This work proposes a time-efficient Natural Gradient Descent method, called\nTENGraD, with linear convergence guarantees. Computing the inverse of the\nneural network's Fisher information matrix is expensive in NGD because the\nFisher matr…
View article: Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance
Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance Open
Recent studies have provided both empirical and theoretical evidence illustrating that heavy tails can emerge in stochastic gradient descent (SGD) in various scenarios. Such heavy tails potentially result in iterates with diverging varianc…
View article: Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections
Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections Open
Gaussian noise injections (GNIs) are a family of simple and widely-used regularisation methods for training neural networks, where one injects additive or multiplicative Gaussian noise to the network activations at every iteration of the o…