Ali Kavis
YOU?
Author Swipe
View article: Online Learning-guided Learning Rate Adaptation via Gradient Alignment
Online Learning-guided Learning Rate Adaptation via Gradient Alignment Open
The performance of an optimizer on large-scale deep learning models depends critically on fine-tuning the learning rate, often requiring an extensive grid search over base learning rates, schedules, and other hyperparameters. In this paper…
View article: Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting
Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting Open
Fine-tuning a pre-trained model on a downstream task often degrades its original capabilities, a phenomenon known as "catastrophic forgetting". This is especially an issue when one does not have access to the data and recipe used to develo…
View article: Understanding Self-Supervised Learning via Gaussian Mixture Models
Understanding Self-Supervised Learning via Gaussian Mixture Models Open
Self-supervised learning attempts to learn representations from un-labeled data; it does so via a loss function that encourages the embedding of a point to be close to that of its augmentations. This simple idea performs remarkably well, y…
View article: Adaptive and Optimal Second-order Optimistic Methods for Minimax Optimization
Adaptive and Optimal Second-order Optimistic Methods for Minimax Optimization Open
We propose adaptive, line search-free second-order methods with optimal rate of convergence for solving convex-concave min-max problems. By means of an adaptive step size, our algorithms feature a simple update rule that requires solving o…
View article: Universal Gradient Methods for Stochastic Convex Optimization
Universal Gradient Methods for Stochastic Convex Optimization Open
We develop universal gradient methods for Stochastic Convex Optimization (SCO). Our algorithms automatically adapt not only to the oracle's noise but also to the Hölder smoothness of the objective function without a priori knowledge of the…
View article: Advancing the lower bounds: An accelerated, stochastic, second-order method with optimal adaptation to inexactness
Advancing the lower bounds: An accelerated, stochastic, second-order method with optimal adaptation to inexactness Open
We present a new accelerated stochastic second-order method that is robust to both gradient and Hessian inexactness, which occurs typically in machine learning. We establish theoretical lower bounds and prove that our algorithm achieves op…
View article: Extra-Newton: A First Approach to Noise-Adaptive Accelerated Second-Order Methods
Extra-Newton: A First Approach to Noise-Adaptive Accelerated Second-Order Methods Open
This work proposes a universal and adaptive second-order method for minimizing second-order smooth, convex functions. Our algorithm achieves $O(σ/ \sqrt{T})$ convergence when the oracle feedback is stochastic with variance $σ^2$, and impro…
View article: Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum Minimization
Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum Minimization Open
We propose an adaptive variance-reduction method, called AdaSpider, for minimization of $L$-smooth, non-convex functions with a finite-sum structure. In essence, AdaSpider combines an AdaGrad-inspired [Duchi et al., 2011, McMahan & Streete…
View article: High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad Stepsize
High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad Stepsize Open
In this paper, we propose a new, simplified high probability analysis of AdaGrad for smooth, non-convex problems. More specifically, we focus on a particular accelerated gradient (AGD) template (Lan, 2020), through which we recover the ori…
View article: On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems
On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems Open
This paper analyzes the trajectories of stochastic gradient descent (SGD) to help understand the algorithm’s convergence properties in non-convex problems. We first show that the sequence of iterates generated by SGD remains bounded and co…
View article: Sifting through the Noise: Universal First-Order Methods for Stochastic Variational Inequalities
Sifting through the Noise: Universal First-Order Methods for Stochastic Variational Inequalities Open
We examine a flexible algorithmic framework for solving monotone variational inequalities in the presence of randomness and uncertainty. The proposed template encompasses a wide range of popular first-order methods, including dual averagin…
View article: STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization
STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization Open
In this work we investigate stochastic non-convex optimization problems where the objective is an expectation over smooth loss functions, and the goal is to find an approximate stationary point. The most popular approach to handling such p…
View article: Double-Loop Unadjusted Langevin Algorithm
Double-Loop Unadjusted Langevin Algorithm Open
A well-known first-order method for sampling from log-concave probability distributions is the Unadjusted Langevin Algorithm (ULA). This work proposes a new annealing step-size schedule for ULA, which allows to prove new convergence guaran…
View article: On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems
On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems Open
This paper analyzes the trajectories of stochastic gradient descent (SGD) to help understand the algorithm's convergence properties in non-convex problems. We first show that the sequence of iterates generated by SGD remains bounded and co…
View article: UniXGrad: A Universal, Adaptive Algorithm with Optimal Guarantees for Constrained Optimization
UniXGrad: A Universal, Adaptive Algorithm with Optimal Guarantees for Constrained Optimization Open
We propose a novel adaptive, accelerated algorithm for the stochastic constrained convex optimization setting. Our method, which is inspired by the Mirror-Prox method, \emph{simultaneously} achieves the optimal rates for smooth/non-smooth …
View article: Efficient learning of smooth probability functions from Bernoulli tests\n with guarantees
Efficient learning of smooth probability functions from Bernoulli tests\n with guarantees Open
We study the fundamental problem of learning an unknown, smooth probability\nfunction via pointwise Bernoulli tests. We provide a scalable algorithm for\nefficiently solving this problem with rigorous guarantees. In particular, we\nprove t…
View article: Efficient learning of smooth probability functions from Bernoulli tests with guarantees
Efficient learning of smooth probability functions from Bernoulli tests with guarantees Open
We study the fundamental problem of learning an unknown, smooth probability function via pointwise Bernoulli tests. We provide a scalable algorithm for efficiently solving this problem with rigorous guarantees. In particular, we prove the …