Explanipedia

Selective Matching Losses -- Not All Scores Are Created Equal Open

Gil I. Shamir, Manfred K. Warmuth · 2025

Learning systems match predicted scores to observations over some domain. Often, it is critical to produce accurate predictions in some subset (or region) of the domain, yet less important to accurately predict in other regions. We constru…

RANK-SMOOTHED PAIRWISE LEARNING IN PERCEPTUAL QUALITY ASSESSMENT Open

Hossein Talebi, Ehsan Amid, Peyman Milanfar, Manfred K. Warmuth · 2024

Conducting pairwise comparisons is a widely used approach in curating human perceptual preference data. Typically raters are instructed to make their choices according to a specific set of rules that address certain dimensions of image qua…

Optimal Transport with Tempered Exponential Measures Open

Ehsan Amid, Frank Nielsen, Richard Nock, Manfred K. Warmuth · 2024

In the field of optimal transport, two prominent subfields face each other: (i) unregularized optimal transport, ``a-la-Kantorovich'', which leads to extremely sparse plans but with algorithms that scale poorly, and (ii) entropic-regulariz…

Noise misleads rotation invariant algorithms on sparse targets Open

Manfred K. Warmuth, Wojciech Kotłowski, Matt Jones, Ehsan Amid · 2024

It is well known that the class of rotation invariant algorithms are suboptimal even for learning sparse linear problems when the number of examples is below the "dimension" of the problem. This class includes any gradient descent trained …

Tempered Calculus for ML: Application to Hyperbolic Model Embedding Open

Richard Nock, Ehsan Amid, Frank Nielsen, Alexander Soen, Manfred K. Warmuth · 2024

Most mathematical distortions used in ML are fundamentally integral in nature: $f$-divergences, Bregman divergences, (regularized) optimal transport distances, integral probability metrics, geodesic distances, etc. In this paper, we unveil…

The Tempered Hilbert Simplex Distance and Its Application To Non-linear Embeddings of TEMs Open

Ehsan Amid, Frank Nielsen, Richard Nock, Manfred K. Warmuth · 2023

Tempered Exponential Measures (TEMs) are a parametric generalization of the exponential family of distributions maximizing the tempered entropy function among positive measures subject to a probability normalization of their power densitie…

Optimal Transport with Tempered Exponential Measures Open

Ehsan Amid, Frank Nielsen, Richard Nock, Manfred K. Warmuth · 2023

In the field of optimal transport, two prominent subfields face each other: (i) unregularized optimal transport, "à-la-Kantorovich", which leads to extremely sparse plans but with algorithms that scale poorly, and (ii) entropic-regularized…

Boosting with Tempered Exponential Measures Open

Richard Nock, Ehsan Amid, Manfred K. Warmuth · 2023

One of the most popular ML algorithms, AdaBoost, can be derived from the dual of a relative entropy minimization problem subject to the fact that the positive weights on the examples sum to one. Essentially, harder examples receive higher …

A Mechanism for Sample-Efficient In-Context Learning for Sparse Retrieval Tasks Open

Jacob Abernethy, Alekh Agarwal, Teodor V. Marinov, Manfred K. Warmuth · 2023

We study the phenomenon of \textit{in-context learning} (ICL) exhibited by large language models, where they can adapt to a new learning task, given a handful of labeled examples, without any explicit parameter optimization. Our goal is to…

Clustering above Exponential Families with Tempered Exponential Measures Open

Ehsan Amid, Richard Nock, Manfred K. Warmuth · 2022

The link with exponential families has allowed $k$-means clustering to be generalized to a wide variety of data generating distributions in exponential families and clustering distortions among Bregman divergences. Getting the framework to…

Layerwise Bregman Representation Learning with Applications to Knowledge Distillation Open

Ehsan Amid, Rohan Anil, Christopher Fifty, Manfred K. Warmuth · 2022

In this work, we propose a novel approach for layerwise representation learning of a trained neural network. In particular, we form a Bregman divergence based on the layer's transfer function and construct an extension of the original Breg…

Learning from Randomly Initialized Neural Network Features Open

Ehsan Amid, Rohan Anil, Wojciech Kotłowski, Manfred K. Warmuth · 2022

We present the surprising result that randomly initialized neural networks are good feature extractors in expectation. These random features correspond to finite-sample realizations of what we call Neural Network Prior Kernel (NNPK), which…

Unlabeled sample compression schemes and corner peelings for ample and maximum classes Open

Jérémie Chalopin, Victor Chepoi, Shay Moran, Manfred K. Warmuth · 2022

Step-size Adaptation Using Exponentiated Gradient Updates Open

Ehsan Amid, Rohan Anil, Christopher Fifty, Manfred K. Warmuth · 2022

Optimizers like Adam and AdaGrad have been very successful in training large-scale neural networks. Yet, the performance of these methods is heavily dependent on a carefully tuned learning rate schedule. We show that in many large-scale ap…

LocoProp: Enhancing BackProp via Local Loss Optimization Open

Ehsan Amid, Rohan Anil, Manfred K. Warmuth · 2021

Second-order methods have shown state-of-the-art performance for optimizing deep neural networks. Nonetheless, their large memory requirement and high computational complexity, compared to first-order methods, hinder their versatility in a…

Exponentiated Gradient Reweighting for Robust Training Under Label Noise and Beyond Open

Negin Majidi, Ehsan Amid, Hossein Talebi, Manfred K. Warmuth · 2021

Many learning tasks in machine learning can be viewed as taking a gradient step towards minimizing the average loss of a batch of examples in each training iteration. When noise is prevalent in the data, this uniform treatment of examples …

A case where a spindly two-layer linear network whips any neural network with a fully connected input layer Open

Manfred K. Warmuth, Wojciech Kotłowski, Ehsan Amid · 2020

It was conjectured that any neural network of any structure and arbitrary differentiable transfer functions at the nodes cannot learn the following problem sample efficiently when trained with gradient descent: The instances are the rows o…

An Implicit Form of Krasulina's k-PCA Update without the Orthonormality Constraint Open

Ehsan Amid, Manfred K. Warmuth · 2020

We shed new insights on the two commonly used updates for the online k-PCA problem, namely, Krasulina's and Oja's updates. We show that Krasulina's update corresponds to a projected gradient descent step on the Stiefel manifold of orthonor…

Reparameterizing Mirror Descent as Gradient Descent Open

Ehsan Amid, Manfred K. Warmuth · 2020

Most of the recent successful applications of neural networks have been based on training with gradient descent updates. However, for some small networks, other mirror descent updates learn provably more efficiently when the target is spar…

Interpolating Between Gradient Descent and Exponentiated Gradient Using Reparameterized Gradient Descent. Open

Ehsan Amid, Manfred K. Warmuth · 2020

Continuous-time mirror descent (CMD) can be seen as the limit case of the discrete-time MD update when the step-size is infinitesimally small. In this paper, we focus on the geometry of the primal and dual CMD updates and introduce a gener…

TriMap: Large-scale Dimensionality Reduction Using Triplets Open

Ehsan Amid, Manfred K. Warmuth · 2019

We introduce "TriMap"; a dimensionality reduction technique based on triplet constraints, which preserves the global structure of the data better than the other commonly used methods such as t-SNE, LargeVis, and UMAP. To quantify the globa…

An Implicit Form of Krasulina's k-PCA Update without the Orthonormality\n Constraint Open

Ehsan Amid, Manfred K. Warmuth · 2019

We shed new insights on the two commonly used updates for the online $k$-PCA\nproblem, namely, Krasulina's and Oja's updates. We show that Krasulina's update\ncorresponds to a projected gradient descent step on the Stiefel manifold of the\…

Mistake bounds on the noise-free multi-armed bandit game Open

Atsuyoshi Nakamura, David P. Helmbold, Manfred K. Warmuth · 2019

Unbiased estimators for random design regression Open

Michał Dereziński, Manfred K. Warmuth, Daniel Hsu · 2019

In linear regression we wish to estimate the optimum linear least squares predictor for a distribution over $d$-dimensional input points and real-valued responses, based on a small sample. Under standard random design analysis, where the s…

Unbiased estimators for random design regression Open

Michał Dereziński, Manfred K. Warmuth, Daniel Hsu · 2019

In linear regression we wish to estimate the optimum linear least squares predictor for a distribution over $d$-dimensional input points and real-valued responses, based on a small sample. Under standard random design analysis, where the s…

Robust Bi-Tempered Logistic Loss Based on Bregman Divergences Open

Ehsan Amid, Manfred K. Warmuth, Rohan Anil, Tomer Koren · 2019

We introduce a temperature into the exponential function and replace the softmax output layer of neural nets by a high temperature generalization. Similarly, the logarithm in the log loss we use for training is replaced by a low temperatur…

Adaptive scale-invariant online algorithms for learning linear models Open

Michał Kempka, Wojciech Kotłowski, Manfred K. Warmuth · 2019

We consider online learning with linear models, where the algorithm predicts on sequentially revealed instances (feature vectors), and is compared against the best linear function (comparator) in hindsight. Popular algorithms in this frame…

Divergence-Based Motivation for Online EM and Combining Hidden Variable Models Open

Ehsan Amid, Manfred K. Warmuth · 2019

Expectation-Maximization (EM) is a prominent approach for parameter estimation of hidden (aka latent) variable models. Given the full batch of data, EM forms an upper-bound of the negative log-likelihood of the model at each iteration and …

Divergence-Based Motivation for Online EM and Combining Hidden Variable\n Models Open

Ehsan Amid, Manfred K. Warmuth · 2019

Expectation-Maximization (EM) is a prominent approach for parameter\nestimation of hidden (aka latent) variable models. Given the full batch of\ndata, EM forms an upper-bound of the negative log-likelihood of the model at\neach iteration a…

Minimax experimental design: Bridging the gap between statistical and worst-case approaches to least squares regression Open

Michał Dereziński, Kenneth L. Clarkson, Michael W. Mahoney, Manfred K. Warmuth · 2019

In experimental design, we are given a large collection of vectors, each with a hidden response value that we assume derives from an underlying linear model, and we wish to pick a small subset of the vectors such that querying the correspo…

Manfred K. Warmuth YOU? Author Swipe