Praneeth Netrapalli
YOU?
Author Swipe
View article: Second Order Methods for Bandit Optimization and Control
Second Order Methods for Bandit Optimization and Control Open
Bandit convex optimization (BCO) is a general framework for online decision making under uncertainty. While tight regret bounds for general convex losses have been established, existing algorithms achieving these bounds have prohibitive co…
View article: Tandem Transformers for Inference Efficient LLMs
Tandem Transformers for Inference Efficient LLMs Open
The autoregressive nature of conventional large language models (LLMs) inherently limits inference speed, as tokens are generated sequentially. While speculative and parallel decoding techniques attempt to mitigate this, they face limitati…
View article: The Feature Speed Formula: a flexible approach to scale hyper-parameters of deep neural networks
The Feature Speed Formula: a flexible approach to scale hyper-parameters of deep neural networks Open
Deep learning succeeds by doing hierarchical feature learning, yet tuning hyper-parameters (HP) such as initialization scales, learning rates etc., only give indirect control over this behavior. In this paper, we introduce a key notion to …
View article: Near Optimal Heteroscedastic Regression with Symbiotic Learning
Near Optimal Heteroscedastic Regression with Symbiotic Learning Open
We consider the problem of heteroscedastic linear regression, where, given $n$ samples $(\mathbf{x}_i, y_i)$ from $y_i = \langle \mathbf{w}^{*}, \mathbf{x}_i \rangle + ε_i \cdot \langle \mathbf{f}^{*}, \mathbf{x}_i \rangle$ with $\mathbf{x…
View article: Optimistic MLE: A Generic Model-Based Algorithm for Partially Observable Sequential Decision Making
Optimistic MLE: A Generic Model-Based Algorithm for Partially Observable Sequential Decision Making Open
This paper introduces a simple efficient learning algorithms for general sequential decision making. The algorithm combines Optimism for exploration with Maximum Likelihood Estimation for model estimation, which is thus named OMLE. We prov…
View article: Simplicity Bias in 1-Hidden Layer Neural Networks
Simplicity Bias in 1-Hidden Layer Neural Networks Open
Recent works have demonstrated that neural networks exhibit extreme simplicity bias(SB). That is, they learn only the simplest features to solve a task at hand, even in the presence of other, more robust but more complex features. Due to t…
View article: Consistent Multiclass Algorithms for Complex Metrics and Constraints
Consistent Multiclass Algorithms for Complex Metrics and Constraints Open
We present consistent algorithms for multiclass learning with complex performance metrics and constraints, where the objective and constraints are defined by arbitrary functions of the confusion matrix. This setting includes many common pe…
View article: Multi-User Reinforcement Learning with Low Rank Rewards
Multi-User Reinforcement Learning with Low Rank Rewards Open
In this work, we consider the problem of collaborative multi-user reinforcement learning. In this setting there are multiple users with the same state-action space and transition probabilities but with different rewards. Under the assumpti…
View article: Learning an Invertible Output Mapping Can Mitigate Simplicity Bias in Neural Networks
Learning an Invertible Output Mapping Can Mitigate Simplicity Bias in Neural Networks Open
Deep Neural Networks are known to be brittle to even minor distribution shifts compared to the training distribution. While one line of work has demonstrated that Simplicity Bias (SB) of DNNs - bias towards learning only the simplest featu…
View article: Optimistic MLE -- A Generic Model-based Algorithm for Partially Observable Sequential Decision Making
Optimistic MLE -- A Generic Model-based Algorithm for Partially Observable Sequential Decision Making Open
This paper introduces a simple efficient learning algorithms for general sequential decision making. The algorithm combines Optimism for exploration with Maximum Likelihood Estimation for model estimation, which is thus named OMLE. We prov…
View article: DAFT: Distilling Adversarially Fine-tuned Models for Better OOD Generalization
DAFT: Distilling Adversarially Fine-tuned Models for Better OOD Generalization Open
We consider the problem of OOD generalization, where the goal is to train a model that performs well on test distributions that are different from the training distribution. Deep learning models are known to be fragile to such shifts and c…
View article: All Mistakes Are Not Equal: Comprehensive Hierarchy Aware Multi-label Predictions (CHAMP)
All Mistakes Are Not Equal: Comprehensive Hierarchy Aware Multi-label Predictions (CHAMP) Open
This paper considers the problem of Hierarchical Multi-Label Classification (HMC), where (i) several labels can be present for each example, and (ii) labels are related via a domain-specific hierarchy tree. Guided by the intuition that all…
View article: MET: Masked Encoding for Tabular Data
MET: Masked Encoding for Tabular Data Open
We consider the task of self-supervised representation learning (SSL) for tabular data: tabular-SSL. Typical contrastive learning based SSL methods require instance-wise data augmentations which are difficult to design for unstructured tab…
View article: Reproducibility in Optimization: Theoretical Framework and Limits
Reproducibility in Optimization: Theoretical Framework and Limits Open
We initiate a formal study of reproducibility in optimization. We define a quantitative measure of reproducibility of optimization procedures in the face of noisy or error-prone operations such as inexact or stochastic gradient computation…
View article: Near-Optimal Lower Bounds For Convex Optimization For All Orders of Smoothness
Near-Optimal Lower Bounds For Convex Optimization For All Orders of Smoothness Open
We study the complexity of optimizing highly smooth convex functions. For a positive integer $p$, we want to find an $ε$-approximate minimum of a convex function $f$, given oracle access to the function and its first $p$ derivatives, assum…
View article: Online Target Q-learning with Reverse Experience Replay: Efficiently finding the Optimal Policy for Linear MDPs
Online Target Q-learning with Reverse Experience Replay: Efficiently finding the Optimal Policy for Linear MDPs Open
Q-learning is a popular Reinforcement Learning (RL) algorithm which is widely used in practice with function approximation (Mnih et al., 2015). In contrast, existing theoretical results are pessimistic about Q-learning. For example, (Baird…
View article: Focus on the Common Good: Group Distributional Robustness Follows
Focus on the Common Good: Group Distributional Robustness Follows Open
We consider the problem of training a classification model with group annotated training data. Recent work has established that, if there is distribution shift across different groups, models trained using the standard empirical risk minim…
View article: Minimax Optimization with Smooth Algorithmic Adversaries
Minimax Optimization with Smooth Algorithmic Adversaries Open
This paper considers minimax optimization $\min_x \max_y f(x, y)$ in the challenging setting where $f$ can be both nonconvex in $x$ and nonconcave in $y$. Though such optimization problems arise in many machine learning paradigms including…
View article: Near-optimal Offline and Streaming Algorithms for Learning Non-Linear Dynamical Systems
Near-optimal Offline and Streaming Algorithms for Learning Non-Linear Dynamical Systems Open
We consider the setting of vector valued non-linear dynamical systems $X_{t+1} = ϕ(A^* X_t) + η_t$, where $η_t$ is unbiased noise and $ϕ: \mathbb{R} \to \mathbb{R}$ is a known link function that satisfies certain {\em expansivity property}…
View article: Sample Efficient Linear Meta-Learning by Alternating Minimization
Sample Efficient Linear Meta-Learning by Alternating Minimization Open
Meta-learning synthesizes and leverages the knowledge from a given set of tasks to rapidly learn new tasks using very little data. Meta-learning of linear regression tasks, where the regressors lie in a low-dimensional subspace, is an exte…
View article: Streaming Linear System Identification with Reverse Experience Replay
Streaming Linear System Identification with Reverse Experience Replay Open
We consider the problem of estimating a linear time-invariant (LTI) dynamical system from a single trajectory via streaming algorithms, which is encountered in several applications including reinforcement learning (RL) and time-series anal…
View article: Do Input Gradients Highlight Discriminative Features?
Do Input Gradients Highlight Discriminative Features? Open
Post-hoc gradient-based interpretability methods [Simonyan et al., 2013, Smilkov et al., 2017] that provide instance-specific explanations of model predictions are often based on assumption (A): magnitude of input gradients -- gradients of…
View article: On Nonconvex Optimization for Machine Learning
On Nonconvex Optimization for Machine Learning Open
Gradient descent (GD) and stochastic gradient descent (SGD) are the workhorses of large-scale machine learning. While classical theory focused on analyzing the performance of these methods in convex optimization problems, the most notable …
View article: Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization
Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization Open
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions $\f_t$ admit a "pseudo-1d" structure, i.e. $\f_t(\w) = \loss_t(\pred_t(\w))$ where the output of $\pred_t$ is o…
View article: No Quantum Speedup over Gradient Descent for Non-Smooth Convex Optimization
No Quantum Speedup over Gradient Descent for Non-Smooth Convex Optimization Open
We study the first-order convex optimization problem, where we have black-box access to a (not necessarily smooth) function f:ℝⁿ → ℝ and its (sub)gradient. Our goal is to find an ε-approximate minimum of f starting from a point that is dis…
View article: Making the Last Iterate of SGD Information Theoretically Optimal
Making the Last Iterate of SGD Information Theoretically Optimal Open
Stochastic gradient descent (SGD) is one of the most widely used algorithms for large scale optimization problems. While classical theoretical analysis of SGD for convex problems studies (suffix) \emph{averages} of iterates and obtains inf…
View article: Projection Efficient Subgradient Method and Optimal Nonsmooth Frank-Wolfe Method
Projection Efficient Subgradient Method and Optimal Nonsmooth Frank-Wolfe Method Open
We consider the classical setting of optimizing a nonsmooth Lipschitz continuous convex function over a convex constraint set, when having access to a (stochastic) first-order oracle (FO) for the function and a projection oracle (PO) for t…
View article: No quantum speedup over gradient descent for non-smooth convex optimization
No quantum speedup over gradient descent for non-smooth convex optimization Open
We study the first-order convex optimization problem, where we have black-box access to a (not necessarily smooth) function $f:\mathbb{R}^n \to \mathbb{R}$ and its (sub)gradient. Our goal is to find an $ε$-approximate minimum of $f$ starti…
View article: Efficient Domain Generalization via Common-Specific Low-Rank Decomposition
Efficient Domain Generalization via Common-Specific Low-Rank Decomposition Open
Domain generalization refers to the task of training a model which generalizes to new domains that are not seen during training. We present CSD (Common Specific Decomposition), for this setting,which jointly learns a common component (whic…