Gal Vardi
YOU?
Author Swipe
View article: On the Hardness of Learning Regular Expressions
On the Hardness of Learning Regular Expressions Open
Despite the theoretical significance and wide practical use of regular expressions, the computational complexity of learning them has been largely unexplored. We study the computational hardness of improperly learning regular expressions i…
View article: Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes
Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes Open
We analyze the generalization gap (gap between the training and test errors) when training a potentially over-parametrized model using a Markovian stochastic training algorithm, initialized from some distribution $θ_0 \sim p_0$. We focus o…
View article: A Theory of Learning with Autoregressive Chain of Thought
A Theory of Learning with Autoregressive Chain of Thought Open
For a given base class of sequence-to-next-token generators, we consider learning prompt-to-answer mappings obtained by iterating a fixed, time-invariant generator for multiple steps, thus generating a chain-of-thought, and then taking the…
View article: Flavors of Margin: Implicit Bias of Steepest Descent in Homogeneous Neural Networks
Flavors of Margin: Implicit Bias of Steepest Descent in Homogeneous Neural Networks Open
We study the implicit bias of the general family of steepest descent algorithms with infinitesimal learning rate in deep homogeneous neural networks. We show that: (a) an algorithm-dependent geometric margin starts increasing once the netw…
View article: Provable Tempered Overfitting of Minimal Nets and Typical Nets
Provable Tempered Overfitting of Minimal Nets and Typical Nets Open
We study the overfitting behavior of fully connected deep Neural Networks (NNs) with binary weights fitted to perfectly classify a noisy training set. We consider interpolation using both the smallest NN (having the minimal number of weigh…
View article: Provable Privacy Attacks on Trained Shallow Neural Networks
Provable Privacy Attacks on Trained Shallow Neural Networks Open
We study what provable privacy attacks can be shown on trained, 2-layer ReLU neural networks. We explore two types of attacks; data reconstruction attacks, and membership inference attacks. We prove that theoretical results on the implicit…
View article: Benign Overfitting in Single-Head Attention
Benign Overfitting in Single-Head Attention Open
The phenomenon of benign overfitting, where a trained neural network perfectly fits noisy training data but still achieves near-optimal test performance, has been extensively studied in recent years for linear models and fully-connected/co…
View article: Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-Context
Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-Context Open
Transformers have the capacity to act as supervised learning algorithms: by properly encoding a set of labeled training ("in-context") examples and an unlabeled test example into an input sequence of vectors of the same dimension, the forw…
View article: Overfitting Behaviour of Gaussian Kernel Ridgeless Regression: Varying Bandwidth or Dimensionality
Overfitting Behaviour of Gaussian Kernel Ridgeless Regression: Varying Bandwidth or Dimensionality Open
We consider the overfitting behavior of minimum norm interpolating solutions of Gaussian kernel ridge regression (i.e. kernel ridgeless regression), when the bandwidth or input dimension varies with the sample size. For fixed dimensions, w…
View article: Approaching Deep Learning through the Spectral Dynamics of Weights
Approaching Deep Learning through the Spectral Dynamics of Weights Open
We propose an empirical approach centered on the spectral dynamics of weights -- the behavior of singular values and vectors during optimization -- to unify and clarify several phenomena in deep learning. We identify a consistent bias in o…
View article: Reconstructing Training Data From Real World Models Trained with Transfer Learning
Reconstructing Training Data From Real World Models Trained with Transfer Learning Open
Current methods for reconstructing training data from trained classifiers are restricted to very small models, limited training set sizes, and low-resolution images. Such restrictions hinder their applicability to real-world scenarios. In …
View article: Perspective Games
Perspective Games Open
We introduce and study perspective games , which model multi-agent systems in which agents can view only the parts of the system that they own. As in standard multi-player turn-based games, the vertices of the game graph are partitioned am…
View article: Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data
Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data Open
Neural networks trained by gradient descent (GD) have exhibited a number of surprising generalization behaviors. First, they can achieve a perfect fit to noisy training data and still generalize near-optimally, showing that overfitting can…
View article: Noisy Interpolation Learning with Shallow Univariate ReLU Networks
Noisy Interpolation Learning with Shallow Univariate ReLU Networks Open
Understanding how overparameterized neural networks generalize despite perfect interpolation of noisy training data is a fundamental question. Mallinar et. al. 2022 noted that neural networks seem to often exhibit ``tempered overfitting'',…
View article: Deconstructing Data Reconstruction: Multiclass, Weight Decay and General Losses
Deconstructing Data Reconstruction: Multiclass, Weight Decay and General Losses Open
Memorization of training data is an active research area, yet our understanding of the inner workings of neural networks is still in its infancy. Recently, Haim et al. (2022) proposed a scheme to reconstruct training samples from multilaye…
View article: An Agnostic View on the Cost of Overfitting in (Kernel) Ridge Regression
An Agnostic View on the Cost of Overfitting in (Kernel) Ridge Regression Open
We study the cost of overfitting in noisy kernel ridge regression (KRR), which we define as the ratio between the test error of the interpolating ridgeless model and the test error of the optimally-tuned model. We take an "agnostic" view i…
View article: Most Neural Networks Are Almost Learnable
Most Neural Networks Are Almost Learnable Open
We present a PTAS for learning random constant-depth networks. We show that for any fixed $ε>0$ and depth $i$, there is a poly-time algorithm that for any distribution on $\sqrt{d} \cdot \mathbb{S}^{d-1}$ learns random Xavier networks of d…
View article: On the Implicit Bias in Deep-Learning Algorithms
On the Implicit Bias in Deep-Learning Algorithms Open
Examining the implicit bias in training neural networks using gradient-based methods.
View article: Reconstructing Training Data from Multiclass Neural Networks
Reconstructing Training Data from Multiclass Neural Networks Open
Reconstructing samples from the training set of trained neural networks is a major privacy concern. Haim et al. (2022) recently showed that it is possible to reconstruct training samples from neural network binary classifiers, based on the…
View article: The Double-Edged Sword of Implicit Bias: Generalization vs. Robustness in ReLU Networks
The Double-Edged Sword of Implicit Bias: Generalization vs. Robustness in ReLU Networks Open
In this work, we study the implications of the implicit bias of gradient flow on generalization and adversarial robustness in ReLU networks. We focus on a setting where the data consists of clusters and the correlations between cluster mea…
View article: Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from KKT Conditions for Margin Maximization
Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from KKT Conditions for Margin Maximization Open
Linear classifiers and leaky ReLU networks trained by gradient flow on the logistic loss have an implicit bias towards solutions which satisfy the Karush--Kuhn--Tucker (KKT) conditions for margin maximization. In this work we establish a n…
View article: Adversarial Examples Exist in Two-Layer ReLU Networks for Low Dimensional Linear Subspaces
Adversarial Examples Exist in Two-Layer ReLU Networks for Low Dimensional Linear Subspaces Open
Despite a great deal of research, it is still not well-understood why trained neural networks are highly vulnerable to adversarial examples. In this work we focus on two-layer neural networks trained using data which lie on a low dimension…
View article: Computational Complexity of Learning Neural Networks: Smoothness and Degeneracy
Computational Complexity of Learning Neural Networks: Smoothness and Degeneracy Open
Understanding when neural networks can be learned efficiently is a fundamental question in learning theory. Existing hardness results suggest that assumptions on both the input distribution and the network's weights are necessary for obtai…
View article: Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data
Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data Open
The implicit biases of gradient-based optimization algorithms are conjectured to be a major factor in the success of modern deep learning. In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fu…
View article: On the Implicit Bias in Deep-Learning Algorithms
On the Implicit Bias in Deep-Learning Algorithms Open
Gradient-based deep-learning algorithms exhibit remarkable performance in practice, but it is not well-understood why they are able to generalize despite having more parameters than training examples. It is believed that implicit bias is a…
View article: Reconstructing Training Data from Trained Neural Networks
Reconstructing Training Data from Trained Neural Networks Open
Understanding to what extent neural networks memorize training data is an intriguing question with practical and theoretical implications. In this paper we show that in some cases a significant fraction of the training data can in fact be …
View article: On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias
On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias Open
We study the dynamics and implicit bias of gradient flow (GF) on univariate ReLU neural networks with a single hidden layer in a binary classification setting. We show that when the labels are determined by the sign of a target network wit…
View article: The Sample Complexity of One-Hidden-Layer Neural Networks
The Sample Complexity of One-Hidden-Layer Neural Networks Open
We study norm-based uniform convergence bounds for neural networks, aiming at a tight understanding of how these are affected by the architecture and type of norm constraint, for the simple class of scalar-valued one-hidden-layer networks,…
View article: Gradient Methods Provably Converge to Non-Robust Networks
Gradient Methods Provably Converge to Non-Robust Networks Open
Despite a great deal of research, it is still unclear why neural networks are so susceptible to adversarial examples. In this work, we identify natural settings where depth-$2$ ReLU networks trained with gradient flow are provably non-robu…
View article: Width is Less Important than Depth in ReLU Neural Networks
Width is Less Important than Depth in ReLU Neural Networks Open
We solve an open question from Lu et al. (2017), by showing that any target network with inputs in $\mathbb{R}^d$ can be approximated by a width $O(d)$ network (independent of the target network's architecture), whose number of parameters …