Explanipedia

Improving Online-to-Nonconvex Conversion for Smooth Optimization via Double Optimism Open

Francisco Patitucci, Ruichen Jiang, Aryan Mokhtari · 2025

A recent breakthrough in nonconvex optimization is the online-to-nonconvex conversion framework of [Cutkosky et al., 2023], which reformulates the task of finding an $\varepsilon$-first-order stationary point as an online learning problem.…

Non-asymptotic global convergence rates of BFGS with exact line search Open

Qiujiang Jin, Ruichen Jiang, Aryan Mokhtari · 2025

In this paper, we explore the non-asymptotic global convergence rates of the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method implemented with exact line search. Notably, due to Dixon’s equivalence result, our findings are also applicable to…

On the Complexity of Finding Stationary Points in Nonconvex Simple Bilevel Optimization Open

Jincheng Cao, Ruichen Jiang, Erfan Yazdandoost Hamedani, Aryan Mokhtari · 2025

In this paper, we study the problem of solving a simple bilevel optimization problem, where the upper-level objective is minimized over the solution set of the lower-level problem. We focus on the general setting in which both the upper- a…

Online Learning-guided Learning Rate Adaptation via Gradient Alignment Open

Ruichen Jiang, Ali Kavis, Aryan Mokhtari · 2025

The performance of an optimizer on large-scale deep learning models depends critically on fine-tuning the learning rate, often requiring an extensive grid search over base learning rates, schedules, and other hyperparameters. In this paper…

Machine Unlearning under Overparameterization Open

Jessica Block, Aryan Mokhtari, Sanjay Shakkottai · 2025

Machine unlearning algorithms aim to remove the influence of specific training samples, ideally recovering the model that would have resulted from training on the remaining data alone. We study unlearning in the overparameterized setting, …

Improved Complexity for Smooth Nonconvex Optimization: A Two-Level Online Learning Approach with Quasi-Newton Methods Open

Ruichen Jiang, Aryan Mokhtari, Francisco Patitucci · 2024

Computer science Mathematics Physics

We study the problem of finding an $ε$-first-order stationary point (FOSP) of a smooth function, given access only to gradient information. The best-known gradient query complexity for this task, assuming both the gradient and Hessian of t…

Learning Mixtures of Experts with EM: A Mirror Descent Perspective Open

Quentin Fruytier, Aryan Mokhtari, Sujay Sanghavi · 2024

Environmental science Computer science

Classical Mixtures of Experts (MoE) are Machine Learning models that involve partitioning the input space, with a separate "expert" model trained on each partition. Recently, MoE-based model architectures have become popular as a means to …

Provable Meta-Learning with Low-Rank Adaptations Open

Jessica Block, Sundararajan Srinivasan, Liam Collins, Aryan Mokhtari, Sanjay Shakkottai · 2024

Computer science Geography

The power of foundation models (FMs) lies in their capacity to learn highly expressive representations that can be adapted to a broad spectrum of tasks. However, these pretrained models require additional training stages to become effectiv…

On the Crucial Role of Initialization for Matrix Factorization Open

Bingcong Li, Liang Zhang, Aryan Mokhtari, Na He · 2024

Computer science Physics Materials science

This work revisits the classical low-rank matrix factorization problem and unveils the critical role of initialization in shaping convergence rates for such nonconvex and nonsmooth optimization. We introduce Nystrom initialization, which s…

Online Learning Guided Quasi-Newton Methods with Global Non-Asymptotic Convergence Open

Ruichen Jiang, Aryan Mokhtari · 2024

Computer science Mathematics Economics

In this paper, we propose a quasi-Newton method for solving smooth and monotone nonlinear equations, including unconstrained minimization and minimax optimization as special cases. For the strongly monotone setting, we establish two global…

Provable Complexity Improvement of AdaGrad over SGD: Upper and Lower Bounds in Stochastic Non-Convex Optimization Open

Devyani Maladkar, Ruichen Jiang, Aryan Mokhtari · 2024

Computer science Mathematics Economics

Adaptive gradient methods, such as AdaGrad, are among the most successful optimization algorithms for neural network training. While these methods are known to achieve better dimensional dependence than stochastic gradient descent (SGD) fo…

Adaptive and Optimal Second-order Optimistic Methods for Minimax Optimization Open

Ruichen Jiang, Ali Kavis, Qiujiang Jin, Sujay Sanghavi, Aryan Mokhtari · 2024

Computer science Mathematics Economics

We propose adaptive, line search-free second-order methods with optimal rate of convergence for solving convex-concave min-max problems. By means of an adaptive step size, our algorithms feature a simple update rule that requires solving o…

Stochastic Newton Proximal Extragradient Method Open

Ruichen Jiang, Michał Dereziński, Aryan Mokhtari · 2024

Computer science Mathematics Economics

Stochastic second-order methods achieve fast local convergence in strongly convex optimization by using noisy Hessian estimates to precondition the gradient. However, these methods typically reach superlinear convergence only when the stoc…

Non-asymptotic Global Convergence Analysis of BFGS with the Armijo-Wolfe Line Search Open

Qiujiang Jin, Ruichen Jiang, Aryan Mokhtari · 2024

Mathematics Computer science Economics

In this paper, we present the first explicit and non-asymptotic global convergence rates of the BFGS method when implemented with an inexact line search scheme satisfying the Armijo-Wolfe conditions. We show that BFGS achieves a global lin…

Non-asymptotic Global Convergence Rates of BFGS with Exact Line Search Open

Qiujiang Jin, Ruichen Jiang, Aryan Mokhtari · 2024

Mathematics Computer science Economics

In this paper, we explore the non-asymptotic global convergence rates of the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method implemented with exact line search. Notably, due to Dixon's equivalence result, our findings are also applicable to…

In-Context Learning with Transformers: Softmax Attention Adapts to Function Lipschitzness Open

Liam Collins, Advait Parulekar, Aryan Mokhtari, Sujay Sanghavi, Sanjay Shakkottai · 2024

Computer science Psychology Engineering

A striking property of transformers is their ability to perform in-context learning (ICL), a machine learning framework in which the learner is presented with a novel context during inference implicitly through some data, and tasked with m…

An Accelerated Gradient Method for Convex Smooth Simple Bilevel Optimization Open

Jincheng Cao, Ruichen Jiang, Erfan Yazdandoost Hamedani, Aryan Mokhtari · 2024

Computer science Mathematics Philosophy

In this paper, we focus on simple bilevel optimization problems, where we minimize a convex smooth objective function over the optimal solution set of another convex smooth constrained optimization problem. We present a novel bilevel optim…

Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate Open

Ruichen Jiang, Parameswaran Raman, Shoham Sabach, Aryan Mokhtari, Mingyi Hong , et al. · 2024

Mathematics Computer science Economics

Second-order optimization methods, such as cubic regularized Newton methods, are known for their rapid convergence rates; nevertheless, they become impractical in high-dimensional problems due to their substantial memory requirements and c…

Projection-Free Methods for Stochastic Simple Bilevel Optimization with Convex Lower-level Problem Open

Jincheng Cao, Ruichen Jiang, Nazanin Abolfazli, Erfan Yazdandoost Hamedani, Aryan Mokhtari · 2023

Mathematics Biology

In this paper, we study a class of stochastic bilevel optimization problems, also known as stochastic simple bilevel optimization, where we minimize a smooth stochastic objective function over the optimal solution set of another stochastic…

Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks Open

Liam Collins, Hamed Hassani, Mahdi Soltanolkotabi, Aryan Mokhtari, Sanjay Shakkottai · 2023

Computer science Materials science Engineering

An increasingly popular machine learning paradigm is to pretrain a neural network (NN) on many tasks offline, then adapt it to downstream tasks, often by re-training only the last linear layer of the network. This approach yields strong do…

Limited-Memory Greedy Quasi-Newton Method with Non-asymptotic Superlinear Convergence Rate Open

Zhan Gao, Aryan Mokhtari, Alec Koppel · 2023

Mathematics Computer science Physics

Non-asymptotic convergence analysis of quasi-Newton methods has gained attention with a landmark result establishing an explicit local superlinear rate of O$((1/\sqrt{t})^t)$. The methods that obtain this rate, however, exhibit a well-know…

An Inexact Conditional Gradient Method for Constrained Bilevel Optimization Open

Nazanin Abolfazli, Ruichen Jiang, Aryan Mokhtari, Erfan Yazdandoost Hamedani · 2023

Computer science Mathematics

Bilevel optimization is an important class of optimization problems where one optimization problem is nested within another. While various methods have emerged to address unconstrained general bilevel optimization problems, there has been …

Accelerated Quasi-Newton Proximal Extragradient: Faster Rate for Smooth Convex Optimization Open

Ruichen Jiang, Aryan Mokhtari · 2023

Mathematics Computer science Economics

In this paper, we propose an accelerated quasi-Newton proximal extragradient (A-QPNE) method for solving unconstrained smooth convex optimization problems. With access only to the gradients of the objective, we prove that our method can ac…

Greedy Pruning with Group Lasso Provably Generalizes for Matrix Sensing Open

Nived Rajaraman, Devvrit Devvrit, Aryan Mokhtari, Kannan Ramchandran · 2023

Mathematics Computer science Materials science

Pruning schemes have been widely used in practice to reduce the complexity of trained models with a massive number of parameters. In fact, several practical studies have shown that if a pruned model is fine-tuned with some gradient-based u…

Online Learning Guided Curvature Approximation: A Quasi-Newton Method with Global Non-Asymptotic Superlinear Convergence Open

Ruichen Jiang, Qiujiang Jin, Aryan Mokhtari · 2023

Mathematics Computer science Physics

Quasi-Newton algorithms are among the most popular iterative methods for solving unconstrained minimization problems, largely due to their favorable superlinear convergence property. However, existing results for these algorithms are limit…

InfoNCE Loss Provably Learns Cluster-Preserving Representations Open

Advait Parulekar, Liam Collins, Karthikeyan Shanmugam, Aryan Mokhtari, Sanjay Shakkottai · 2023

Computer science Mathematics Chemistry

The goal of contrasting learning is to learn a representation that preserves underlying clusters by keeping samples with similar content, e.g. the ``dogness'' of a dog, close to each other in the space generated by the representation. A co…

Network Adaptive Federated Learning: Congestion and Lossy Compression Open

Parikshit Hegde, Gustavo de Veciana, Aryan Mokhtari · 2023

Computer science Materials science

In order to achieve the dual goals of privacy and learning across distributed data, Federated Learning (FL) systems rely on frequent exchanges of large files (model updates) between a set of clients and the server. As such FL systems are e…

Conditional Gradient Methods Open

Gábor Braun, Alejandro Carderera, Cyrille W. Combettes, Hamed Hassani, Amin Karbasi , et al. · 2022

Computer science

The purpose of this survey is to serve both as a gentle introduction and a coherent overview of state-of-the-art Frank--Wolfe algorithms, also called conditional gradient algorithms, for function minimization. These algorithms are especial…

Non-asymptotic superlinear convergence of standard quasi-Newton methods Open

Qiujiang Jin, Aryan Mokhtari · 2022

Mathematics Computer science Economics

In this paper, we study and prove the non-asymptotic superlinear convergence rate of the Broyden class of quasi-Newton algorithms which includes the Davidon–Fletcher–Powell (DFP) method and the Broyden–Fletcher–Goldfarb–Shanno (BFGS) metho…

Future Gradient Descent for Adapting the Temporal Shifting Data Distribution in Online Recommendation Systems Open

Mao Ye, Ruichen Jiang, Haoxiang Wang, Dhruv Choudhary, Xiaocong Du , et al. · 2022

Computer science Mathematics Physics

One of the key challenges of learning an online recommendation model is the temporal domain shift, which causes the mismatch between the training and testing data distribution and hence domain generalization error. To overcome, we propose …

Aryan Mokhtari YOU? Author Swipe