Explanipedia

On the Hardness of Learning Regular Expressions Open

Idan Attias, Lev Reyzin, Nathan Srebro, Gal Vardi · 2025

Despite the theoretical significance and wide practical use of regular expressions, the computational complexity of learning them has been largely unexplored. We study the computational hardness of improperly learning regular expressions i…

Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes Open

Itamar Harel, Yonathan Wolanowsky, Gal Vardi, Nathan Srebro, Daniel Soudry · 2025

We analyze the generalization gap (gap between the training and test errors) when training a potentially over-parametrized model using a Markovian stochastic training algorithm, initialized from some distribution $θ_0 \sim p_0$. We focus o…

A Theory of Learning with Autoregressive Chain of Thought Open

Nirmit Joshi, Gal Vardi, Adam Block, Surbhi Goel, Zhiyuan Li , et al. · 2025

For a given base class of sequence-to-next-token generators, we consider learning prompt-to-answer mappings obtained by iterating a fixed, time-invariant generator for multiple steps, thus generating a chain-of-thought, and then taking the…

Flavors of Margin: Implicit Bias of Steepest Descent in Homogeneous Neural Networks Open

Nikolaos Tsilivis, Gal Vardi, Julia Kempe · 2024

We study the implicit bias of the general family of steepest descent algorithms with infinitesimal learning rate in deep homogeneous neural networks. We show that: (a) an algorithm-dependent geometric margin starts increasing once the netw…

Provable Tempered Overfitting of Minimal Nets and Typical Nets Open

Itamar Harel, William M. Hoza, Gal Vardi, Itay Evron, Nathan Srebro , et al. · 2024

Computer science Mathematics

We study the overfitting behavior of fully connected deep Neural Networks (NNs) with binary weights fitted to perfectly classify a noisy training set. We consider interpolation using both the smallest NN (having the minimal number of weigh…

Provable Privacy Attacks on Trained Shallow Neural Networks Open

Guy Smorodinsky, Gal Vardi, Itay Safran · 2024

Computer science

We study what provable privacy attacks can be shown on trained, 2-layer ReLU neural networks. We explore two types of attacks; data reconstruction attacks, and membership inference attacks. We prove that theoretical results on the implicit…

Benign Overfitting in Single-Head Attention Open

Roey Magen, Shuning Shang, Zhiwei Xu, Spencer Frei, Wei Hu , et al. · 2024

Computer science Biology

The phenomenon of benign overfitting, where a trained neural network perfectly fits noisy training data but still achieves near-optimal test performance, has been extensively studied in recent years for linear models and fully-connected/co…

Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-Context Open

Spencer Frei, Gal Vardi · 2024

Computer science Engineering Biology

Transformers have the capacity to act as supervised learning algorithms: by properly encoding a set of labeled training ("in-context") examples and an unlabeled test example into an input sequence of vectors of the same dimension, the forw…

Overfitting Behaviour of Gaussian Kernel Ridgeless Regression: Varying Bandwidth or Dimensionality Open

Mikhail V. Medvedev, Gal Vardi, Nathan Srebro · 2024

Mathematics Computer science Physics

We consider the overfitting behavior of minimum norm interpolating solutions of Gaussian kernel ridge regression (i.e. kernel ridgeless regression), when the bandwidth or input dimension varies with the sample size. For fixed dimensions, w…

Approaching Deep Learning through the Spectral Dynamics of Weights Open

David Yunis, Kumar Kshitij Patel, Samuel C. Wheeler, Pedro Savarese, Gal Vardi , et al. · 2024

Computer science Psychology Physics

We propose an empirical approach centered on the spectral dynamics of weights -- the behavior of singular values and vectors during optimization -- to unify and clarify several phenomena in deep learning. We identify a consistent bias in o…

Reconstructing Training Data From Real World Models Trained with Transfer Learning Open

Yakir Oz, Gilad Yehudai, Gal Vardi, Itai Antebi, Michal Irani , et al. · 2024

Computer science Geography

Current methods for reconstructing training data from trained classifiers are restricted to very small models, limited training set sizes, and low-resolution images. Such restrictions hinder their applicability to real-world scenarios. In …

Perspective Games Open

Orna Kupferman, Gal Vardi · 2023

Mathematics Computer science

We introduce and study perspective games , which model multi-agent systems in which agents can view only the parts of the system that they own. As in standard multi-player turn-based games, the vertices of the game graph are partitioned am…

Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data Open

Zhiwei Xu, Yutong Wang, Spencer Frei, Gal Vardi, Wei Hu · 2023

Computer science Mathematics

Neural networks trained by gradient descent (GD) have exhibited a number of surprising generalization behaviors. First, they can achieve a perfect fit to noisy training data and still generalize near-optimally, showing that overfitting can…

Noisy Interpolation Learning with Shallow Univariate ReLU Networks Open

Nirmit Joshi, Gal Vardi, Nathan Srebro · 2023

Mathematics Computer science Sociology

Understanding how overparameterized neural networks generalize despite perfect interpolation of noisy training data is a fundamental question. Mallinar et. al. 2022 noted that neural networks seem to often exhibit ``tempered overfitting'',…

Deconstructing Data Reconstruction: Multiclass, Weight Decay and General Losses Open

Gon Buzaglo, Niv Haim, Gilad Yehudai, Gal Vardi, Yakir Oz , et al. · 2023

Computer science Mathematics Psychology

Memorization of training data is an active research area, yet our understanding of the inner workings of neural networks is still in its infancy. Recently, Haim et al. (2022) proposed a scheme to reconstruct training samples from multilaye…

An Agnostic View on the Cost of Overfitting in (Kernel) Ridge Regression Open

Lijia Zhou, James B. Simon, Gal Vardi, Nathan Srebro · 2023

Computer science Mathematics Physics

We study the cost of overfitting in noisy kernel ridge regression (KRR), which we define as the ratio between the test error of the interpolating ridgeless model and the test error of the optimally-tuned model. We take an "agnostic" view i…

Most Neural Networks Are Almost Learnable Open

Amit Daniely, Nathan Srebro, Gal Vardi · 2023

Mathematics Physics Computer science

We present a PTAS for learning random constant-depth networks. We show that for any fixed $ε>0$ and depth $i$, there is a poly-time algorithm that for any distribution on $\sqrt{d} \cdot \mathbb{S}^{d-1}$ learns random Xavier networks of d…

On the Implicit Bias in Deep-Learning Algorithms Open

Gal Vardi · 2023

Computer science

Examining the implicit bias in training neural networks using gradient-based methods.

Reconstructing Training Data from Multiclass Neural Networks Open

Gon Buzaglo, Niv Haim, Gilad Yehudai, Gal Vardi, Michal Irani · 2023

Computer science Mathematics Physics

Reconstructing samples from the training set of trained neural networks is a major privacy concern. Haim et al. (2022) recently showed that it is possible to reconstruct training samples from neural network binary classifiers, based on the…

The Double-Edged Sword of Implicit Bias: Generalization vs. Robustness in ReLU Networks Open

Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro · 2023

Computer science Mathematics Chemistry

In this work, we study the implications of the implicit bias of gradient flow on generalization and adversarial robustness in ReLU networks. We focus on a setting where the data consists of clusters and the correlations between cluster mea…

Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from KKT Conditions for Margin Maximization Open

Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro · 2023

Computer science Mathematics

Linear classifiers and leaky ReLU networks trained by gradient flow on the logistic loss have an implicit bias towards solutions which satisfy the Karush--Kuhn--Tucker (KKT) conditions for margin maximization. In this work we establish a n…

Adversarial Examples Exist in Two-Layer ReLU Networks for Low Dimensional Linear Subspaces Open

Odelia Melamed, Gilad Yehudai, Gal Vardi · 2023

Computer science Mathematics Chemistry

Despite a great deal of research, it is still not well-understood why trained neural networks are highly vulnerable to adversarial examples. In this work we focus on two-layer neural networks trained using data which lie on a low dimension…

Computational Complexity of Learning Neural Networks: Smoothness and Degeneracy Open

Amit Daniely, Nathan Srebro, Gal Vardi · 2023

Computer science Mathematics Physics

Understanding when neural networks can be learned efficiently is a fundamental question in learning theory. Existing hardness results suggest that assumptions on both the input distribution and the network's weights are necessary for obtai…

Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data Open

Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro, Wei Hu · 2022

Computer science Mathematics

The implicit biases of gradient-based optimization algorithms are conjectured to be a major factor in the success of modern deep learning. In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fu…

On the Implicit Bias in Deep-Learning Algorithms Open

Gal Vardi · 2022

Computer science Psychology

Gradient-based deep-learning algorithms exhibit remarkable performance in practice, but it is not well-understood why they are able to generalize despite having more parameters than training examples. It is believed that implicit bias is a…

Reconstructing Training Data from Trained Neural Networks Open

Niv Haim, Gal Vardi, Gilad Yehudai, Ohad Shamir, Michal Irani · 2022

Computer science Mathematics Physics

Understanding to what extent neural networks memorize training data is an intriguing question with practical and theoretical implications. In this paper we show that in some cases a significant fraction of the training data can in fact be …

On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias Open

Itay Safran, Gal Vardi, Jason D. Lee · 2022

Mathematics Computer science Economics

We study the dynamics and implicit bias of gradient flow (GF) on univariate ReLU neural networks with a single hidden layer in a binary classification setting. We show that when the labels are determined by the sign of a target network wit…

The Sample Complexity of One-Hidden-Layer Neural Networks Open

Gal Vardi, Ohad Shamir, Nathan Srebro · 2022

Mathematics Computer science Physics

We study norm-based uniform convergence bounds for neural networks, aiming at a tight understanding of how these are affected by the architecture and type of norm constraint, for the simple class of scalar-valued one-hidden-layer networks,…

Gradient Methods Provably Converge to Non-Robust Networks Open

Gal Vardi, Gilad Yehudai, Ohad Shamir · 2022

Computer science Mathematics Engineering

Despite a great deal of research, it is still unclear why neural networks are so susceptible to adversarial examples. In this work, we identify natural settings where depth-$2$ ReLU networks trained with gradient flow are provably non-robu…

Width is Less Important than Depth in ReLU Neural Networks Open

Gal Vardi, Gilad Yehudai, Ohad Shamir · 2022

Mathematics Computer science Physics

We solve an open question from Lu et al. (2017), by showing that any target network with inputs in $\mathbb{R}^d$ can be approximated by a width $O(d)$ network (independent of the target network's architecture), whose number of parameters …

Gal Vardi YOU? Author Swipe