Explanipedia

DropCompute: simple and more robust distributed synchronous training via compute variance reduction Open

Niv Giladi, Shahar Gottlieb, Moran Shkolnik, Asaf Karnieli, Ron Banner , et al. · 2023

Background: Distributed training is essential for large scale training of deep neural networks (DNNs). The dominant methods for large scale DNN training are synchronous (e.g. All-Reduce), but these require waiting for all workers in each s…

Energy awareness in low precision neural networks Open

Nurit Spingarn Eliezer, Ron Banner, Elad Hoffer, Hilla Ben-Yaakov, Tomer Michaeli · 2022

Computer science Mathematics Engineering

Power consumption is a major obstacle in the deployment of deep neural networks (DNNs) on end devices. Existing approaches for reducing power consumption rely on quite general principles, including avoidance of multiplication operations an…

Accurate Neural Training with 4-bit Matrix Multiplications at Standard Formats Open

Brian Chmiel, Ron Banner, Elad Hoffer, Hilla Ben Yaacov, Daniel Soudry · 2021

Computer science Mathematics Biology

Quantization of the weights and activations is one of the main methods to reduce the computational footprint of Deep Neural Networks (DNNs) training. Current methods enable 4-bit quantization of the forward phase. However, this constitutes…

Task-Agnostic Continual Learning Using Online Variational Bayes with Fixed-Point Updates Open

Chen Zeno, Itay Golan, Elad Hoffer, Daniel Soudry · 2021

Computer science Mathematics Physics

Catastrophic forgetting is the notorious vulnerability of neural networks to the changes in the data distribution during learning. This phenomenon has long been considered a major obstacle for using learning agents in realistic continual l…

Neural gradients are lognormally distributed: understanding sparse and quantized training. Open

Brian Chmiel, Liad Ben-Uri, Moran Shkolnik, Elad Hoffer, Ron Banner , et al. · 2020

Computer science Mathematics Biology

Neural gradient compression remains a main bottleneck in improving training efficiency, as most existing neural network compression methods (e.g., pruning or quantization) focus on weights, activations, and weight gradients. However, these…

Neural gradients are near-lognormal: improved quantized and sparse training Open

Brian Chmiel, Liad Ben-Uri, Moran Shkolnik, Elad Hoffer, Ron Banner , et al. · 2020

Computer science Mathematics Biology

While training can mostly be accelerated by reducing the time needed to propagate neural gradients back throughout the model, most previous works focus on the quantization/pruning of weights and activations. These methods are often not app…

Augment Your Batch: Improving Generalization Through Instance Repetition Open

Elad Hoffer, Tal Ben‐Nun, Itay Hubara, Niv Giladi, Torsten Hoefler , et al. · 2020

Computer science Mathematics Economics

Large-batch SGD is important for scaling training of deep neural networks. However, without fine-tuning hyperparameter schedules, the generalization of the model may be hampered. We propose to use batch augmentation: replicating instances …

The Knowledge Within: Methods for Data-Free Model Compression Open

Matan Haroush, Itay Hubara, Elad Hoffer, Daniel Soudry · 2020

Computer science Mathematics Engineering

Recently, an extensive amount of research has been focused on compressing and accelerating Deep Neural Networks (DNN). So far, high compression rate algorithms require part of the training dataset for a low precision calibration, or a fine…

At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks? Open

Niv Giladi, Mor Shpigel Nacson, Elad Hoffer, Daniel Soudry · 2020

Computer science Mathematics Economics

Background: Recent developments have made it possible to accelerate neural networks training significantly using large batch sizes and data parallelism. Training in an asynchronous fashion, where delay occurs, can make training even more s…

At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks? Open

Niv Giladi, Mor Shpigel Nacson, Elad Hoffer, Daniel Soudry · 2019

Computer science Mathematics Physics

Background: Recent developments have made it possible to accelerate neural networks training significantly using large batch sizes and data parallelism. Training in an asynchronous fashion, where delay occurs, can make training even more s…

Mix & Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency Open

Elad Hoffer, Berry Weinstein, Itay Hubara, Tal Ben‐Nun, Torsten Hoefler , et al. · 2019

Computer science Philosophy Materials science

Convolutional neural networks (CNNs) are commonly trained using a fixed spatial image size predetermined for a given model. Although trained on images of aspecific size, it is well established that CNNs can be used to evaluate a wide range…

Augment your batch: better training with larger batches Open

Elad Hoffer, Tal Ben‐Nun, Itay Hubara, Niv Giladi, Torsten Hoefler , et al. · 2019

Computer science Engineering Geography

Large-batch SGD is important for scaling training of deep neural networks. However, without fine-tuning hyperparameter schedules, the generalization of the model may be hampered. We propose to use batch augmentation: replicating instances …

Post-training 4-bit quantization of convolution networks for rapid-deployment Open

Ron Banner, Yury Nahshan, Elad Hoffer, Daniel Soudry · 2018

Computer science Mathematics

Convolutional neural networks require significant memory bandwidth and storage for intermediate computations, apart from substantial computing resources. Neural network quantization has significant benefits in reducing the amount of interm…

ACIQ: Analytical Clipping for Integer Quantization of neural networks Open

Ron Banner, Yury Nahshan, Elad Hoffer, Daniel Soudry · 2018

Computer science Philosophy

Unlike traditional approaches that focus on the quantization at the network level, in this work we propose to minimize the quantization effect at the tensor level. We analyze the trade-off between quantization noise and clipping distortion…

Scalable methods for 8-bit training of neural networks Open

Ron Banner, Itay Hubara, Elad Hoffer, Daniel Soudry · 2018

Computer science Sociology

Quantized Neural Networks (QNNs) are often used to improve network efficiency during the inference phase, i.e. after the network has been trained. Extensive research in the field suggests many different quantization schemes. Still, the num…

Scalable Methods for 8-bit Training of Neural Networks Open

Ron Banner, Itay Hubara, Elad Hoffer, Daniel Soudry · 2018

Computer science Sociology

Quantized Neural Networks (QNNs) are often used to improve network efficiency during the inference phase, i.e. after the network has been trained. Extensive research in the field suggests many different quantization schemes. Still, the num…

Task Agnostic Continual Learning Using Online Variational Bayes Open

Chen Zeno, Itay Golan, Elad Hoffer, Daniel Soudry · 2018

Computer science Psychology Engineering

Catastrophic forgetting is the notorious vulnerability of neural networks to the change of the data distribution while learning. This phenomenon has long been considered a major obstacle for allowing the use of learning agents in realistic…

Bayesian Gradient Descent: Online Variational Bayes Learning with Increased Robustness to Catastrophic Forgetting and Weight Pruning. Open

Chen Zeno, Itay Golan, Elad Hoffer, Daniel Soudry · 2018

Computer science Biology Chemistry

We suggest a novel approach for the estimation of the posterior distribution of the weights of a neural network, using an online version of the variational Bayes method. Having a confidence measure of the weights allows to combat several s…

Norm matters: efficient and accurate normalization schemes in deep networks Open

Elad Hoffer, Ron Banner, Itay Golan, Daniel Soudry · 2018

Computer science Sociology Political science

Over the past few years, Batch-Normalization has been commonly used in deep networks, allowing faster training and high performance for a wide variety of applications. However, the reasons behind its merits remained unanswered, with severa…

On the Blindspots of Convolutional Networks Open

Elad Hoffer, Shai Fine, Daniel Soudry · 2018

Computer science Engineering Philosophy

Deep convolutional network has been the state-of-the-art approach for a wide variety of tasks over the last few years. Its successes have, in many cases, turned it into the default model in quite a few domains. In this work, we will demons…

Fix your classifier: the marginal value of training the last weight layer Open

Elad Hoffer, Itay Hubara, Daniel Soudry · 2018

Computer science Mathematics

Neural networks are commonly used as models for classification for a wide variety of tasks. Typically, a learned affine transformation is placed at the end of such models, yielding a per-class value used for classification. This classifier…

The Implicit Bias of Gradient Descent on Separable Data Open

Daniel Soudry, Elad Hoffer, Nathan Srebro, Gunasekar, Suriya, Srebro, Nathan · 2017

Mathematics Computer science Economics

We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. Th…

Train longer, generalize better: closing the generalization gap in large batch training of neural networks Open

Elad Hoffer, Itay Hubara, Daniel Soudry · 2017

Computer science Mathematics Sociology

Background: Deep learning models are typically trained using stochastic gradient descent or one of its variants. These methods update the weights using their gradient, estimated from a small fraction of the training data. It has been obser…

Exponentially vanishing sub-optimal local minima in multilayer neural networks Open

Daniel Soudry, Elad Hoffer · 2017

Mathematics Physics Computer science

Background: Statistical mechanics results (Dauphin et al. (2014); Choromanska et al. (2015)) suggest that local minima with high error are exponentially rare in high dimensions. However, to prove low error guarantees for Multilayer Neural …

Semi-supervised deep learning by metric embedding Open

Elad Hoffer, Nir Ailon · 2017

Computer science Mathematics Economics

Deep networks are successfully used as classification models yielding state-of-the-art results when trained on a large number of labeled samples. These models, however, are usually much less suited for semi-supervised problems because of t…

Spatial contrasting for deep unsupervised learning Open

Elad Hoffer, Itay Hubara, Nir Ailon · 2016

Computer science

Convolutional networks have marked their place over the last few years as the best performing model for various visual tasks. They are, however, most suited for supervised learning from large amounts of labeled data. Previous attempts have…

Semi-supervised deep learning by metric embedding Open

Elad Hoffer, Nir Ailon · 2016

Computer science Mathematics Engineering

Deep networks are successfully used as classification models yielding state-of-the-art results when trained on a large number of labeled samples. These models, however, are usually much less suited for semi-supervised problems because of t…

Deep unsupervised learning through spatial contrasting Open

Elad Hoffer, Itay Hubara, Nir Ailon · 2016

Computer science

Convolutional networks have marked their place over the last few years as the best performing model for various visual tasks. They are, however, most suited for supervised learning from large amounts of labeled data. Previous attempts have…

Elad Hoffer YOU? Author Swipe