Andrea Montanari
YOU?
Author Swipe
View article: Dynamical Decoupling of Generalization and Overfitting in Large Two-Layer Networks
Dynamical Decoupling of Generalization and Overfitting in Large Two-Layer Networks Open
Understanding the inductive bias and generalization properties of large overparametrized machine learning models requires to characterize the dynamics of the training algorithm. We study the learning dynamics of large two-layer neural netw…
View article: Local minima of the empirical risk in high dimension: General theorems and convex examples
Local minima of the empirical risk in high dimension: General theorems and convex examples Open
We consider a general model for high-dimensional empirical risk minimization whereby the data $\mathbf{x}_i$ are $d$-dimensional isotropic Gaussian vectors, the model is parametrized by $\mathbfΘ\in\mathbb{R}^{d\times k}$, and the loss dep…
View article: Provably Efficient Posterior Sampling for Sparse Linear Regression via Measure Decomposition
Provably Efficient Posterior Sampling for Sparse Linear Regression via Measure Decomposition Open
We consider the problem of sampling from the posterior distribution of a $d$-dimensional coefficient vector $\boldsymbolθ$, given linear observations $\boldsymbol{y} = \boldsymbol{X}\boldsymbolθ+\boldsymbol{\varepsilon}$. In general, such …
View article: Which exceptional low-dimensional projections of a Gaussian point cloud can be found in polynomial time?
Which exceptional low-dimensional projections of a Gaussian point cloud can be found in polynomial time? Open
Given $d$-dimensional standard Gaussian vectors $\boldsymbol{x}_1,\dots, \boldsymbol{x}_n$, we consider the set of all empirical distributions of its $m$-dimensional projections, for $m$ a fixed constant. Diaconis and Freedman (1984) prove…
View article: On Smale's 17th problem over the reals
On Smale's 17th problem over the reals Open
We consider the problem of efficiently solving a system of $n$ non-linear equations in ${\mathbb R}^d$. Addressing Smale's 17th problem stated in 1998, we consider a setting whereby the $n$ equations are random homogeneous polynomials of a…
View article: Sampling from Spherical Spin Glasses in Total Variation via Algorithmic Stochastic Localization
Sampling from Spherical Spin Glasses in Total Variation via Algorithmic Stochastic Localization Open
We consider the problem of algorithmically sampling from the Gibbs measure of a mixed $p$-spin spherical spin glass. We give a polynomial-time algorithm that samples from the Gibbs measure up to vanishing total variation error, for any mod…
View article: Scaling laws for learning with real and surrogate data
Scaling laws for learning with real and surrogate data Open
Collecting large quantities of high-quality data can be prohibitively expensive or impractical, and a bottleneck in machine learning. One may instead augment a small set of $n$ data points from the target distribution with data from more a…
View article: Optimization of random cost functions and statistical physics
Optimization of random cost functions and statistical physics Open
This is the text of my report presented at the 29th Solvay Conference on Physics on `The Structure and Dynamics of Disordered Systems' held in Bruxelles from October 19 to 21, 2023. I consider the problem of minimizing a random energy func…
View article: Discovery of sparse, reliable omic biomarkers with Stabl
Discovery of sparse, reliable omic biomarkers with Stabl Open
Adoption of high-content omic technologies in clinical studies, coupled with computational methods, has yielded an abundance of candidate biomarkers. However, translating such findings into bona fide clinical biomarkers remains challenging…
View article: Sampling from Mean-Field Gibbs Measures via Diffusion Processes
Sampling from Mean-Field Gibbs Measures via Diffusion Processes Open
We consider Ising mixed $p$-spin glasses at high-temperature and without external field, and study the problem of sampling from the Gibbs distribution $μ$ in polynomial time. We develop a new sampling algorithm with complexity of the same …
View article: Universality of max-margin classifiers
Universality of max-margin classifiers Open
Maximum margin binary classification is one of the most fundamental algorithms in machine learning, yet the role of featurization maps and the high-dimensional asymptotics of the misclassification error for non-Gaussian features are still …
View article: Towards a statistical theory of data selection under weak supervision
Towards a statistical theory of data selection under weak supervision Open
Given a sample of size $N$, it is often useful to select a subsample of smaller size $n
View article: Six Lectures on Linearized Neural Networks
Six Lectures on Linearized Neural Networks Open
In these six lectures, we examine what can be learnt about the behavior of multi-layer neural networks from the analysis of linear models. We first recall the correspondence between neural networks and linear models via the so-called lazy …
View article: Shattering in Pure Spherical Spin Glasses
Shattering in Pure Spherical Spin Glasses Open
We prove the existence of a shattered phase within the replica-symmetric phase of the pure spherical $p$-spin models for $p$ sufficiently large. In this phase, we construct a decomposition of the sphere into well-separated small clusters, …
View article: Adversarial examples in random neural networks with general activations
Adversarial examples in random neural networks with general activations Open
A substantial body of empirical work documents the lack of robustness in deep learning models to adversarial examples. Recent theoretical work proved that adversarial examples are ubiquitous in two-layers networks with sub-exponential widt…
View article: Solving systems of Random Equations via First and Second-Order Optimization Algorithms
Solving systems of Random Equations via First and Second-Order Optimization Algorithms Open
Gradient-based (a.k.a. `first order') optimization algorithms are routinely used to solve large scale non-convex problems. Yet, it is generally hard to predict their effectiveness. In order to gain insight into this question, we revisit th…
View article: Sampling, Diffusions, and Stochastic Localization
Sampling, Diffusions, and Stochastic Localization Open
Diffusions are a successful technique to sample from high-dimensional distributions. The target distribution can be either explicitly given or learnt from a collection of samples. They implement a diffusion process whose endpoint is a samp…
View article: Local algorithms for maximum cut and minimum bisection on locally treelike regular graphs of large degree
Local algorithms for maximum cut and minimum bisection on locally treelike regular graphs of large degree Open
Given a graph of degree over vertices, we consider the problem of computing a near maximum cut or a near minimum bisection in polynomial time. For graphs of girth , we develop a local message passing algorithm whose complexity is , and tha…
View article: Posterior Sampling in High Dimension via Diffusion Processes
Posterior Sampling in High Dimension via Diffusion Processes Open
Sampling from the posterior is a key technical problem in Bayesian statistics. Rigorous guarantees are difficult to obtain for Markov Chain Monte Carlo algorithms of common use. In this paper, we study an alternative class of algorithms ba…
View article: Stabl: sparse and reliable biomarker discovery in predictive modeling of high-dimensional omic data
Stabl: sparse and reliable biomarker discovery in predictive modeling of high-dimensional omic data Open
High-content omic technologies coupled with sparsity-promoting regularization methods (SRM) have transformed the biomarker discovery process. However, the translation of computational results into a clinical use-case scenario remains chall…
View article: Learning time-scales in two-layers neural networks
Learning time-scales in two-layers neural networks Open
Gradient-based learning in multi-layer neural networks displays a number of striking features. In particular, the decrease rate of empirical risk is non-monotone even after averaging over large batches. Long plateaus in which one observes …
View article: Compressing Tabular Data via Latent Variable Estimation
Compressing Tabular Data via Latent Variable Estimation Open
Data used for analytics and machine learning often take the form of tables with categorical entries. We introduce a family of lossless compression algorithms for such data that proceed in four steps: $(i)$ Estimate latent variables associa…
View article: Nonnegative Matrix Factorization Via Archetypal Analysis
Nonnegative Matrix Factorization Via Archetypal Analysis Open
Given a collection of data points, nonnegative matrix factorization (NMF) suggests expressing them as convex combinations of a small set of “archetypes” with nonnegative entries. This decomposition is unique only if the true archetypes are…
View article: Equivalence of Approximate Message Passing and Low-Degree Polynomials in Rank-One Matrix Estimation
Equivalence of Approximate Message Passing and Low-Degree Polynomials in Rank-One Matrix Estimation Open
We consider the problem of estimating an unknown parameter vector ${\boldsymbol θ}\in{\mathbb R}^n$, given noisy observations ${\boldsymbol Y} = {\boldsymbol θ}{\boldsymbol θ}^{\top}/\sqrt{n}+{\boldsymbol Z}$ of the rank-one matrix ${\bold…
View article: Fundamental Limits of Low-Rank Matrix Estimation with Diverging Aspect Ratios
Fundamental Limits of Low-Rank Matrix Estimation with Diverging Aspect Ratios Open
We consider the problem of estimating the factors of a low-rank $n \times d$ matrix, when this is corrupted by additive Gaussian noise. A special example of our setting corresponds to clustering mixtures of Gaussians with equal (known) cov…
View article: Dimension free ridge regression
Dimension free ridge regression Open
Random matrix theory has become a widely useful tool in high-dimensional statistics and theoretical machine learning. However, random matrix theory is largely focused on the proportional asymptotics in which the number of columns grows pro…
View article: Optimization of random high-dimensional functions: Structure and algorithms
Optimization of random high-dimensional functions: Structure and algorithms Open
Replica symmetry breaking postulates that near optima of spin glass Hamiltonians have an ultrametric structure. Namely, near optima can be associated to leaves of a tree, and the Euclidean distance between them corresponds to the distance …
View article: Overparametrized linear dimensionality reductions: From projection pursuit to two-layer neural networks
Overparametrized linear dimensionality reductions: From projection pursuit to two-layer neural networks Open
Given a cloud of $n$ data points in $\mathbb{R}^d$, consider all projections onto $m$-dimensional subspaces of $\mathbb{R}^d$ and, for each such projection, the empirical distribution of the projected points. What does this collection of p…
View article: A Friendly Tutorial on Mean-Field Spin Glass Techniques for Non-Physicists
A Friendly Tutorial on Mean-Field Spin Glass Techniques for Non-Physicists Open
This tutorial is based on lecture notes written for a class taught in the Statistics Department at Stanford in the Winter Quarter of 2017. The objective was to provide a working knowledge of some of the techniques developed over the last 4…