Elad Hoffer
YOU?
Author Swipe
View article: DropCompute: simple and more robust distributed synchronous training via compute variance reduction
DropCompute: simple and more robust distributed synchronous training via compute variance reduction Open
Background: Distributed training is essential for large scale training of deep neural networks (DNNs). The dominant methods for large scale DNN training are synchronous (e.g. All-Reduce), but these require waiting for all workers in each s…
View article: Energy awareness in low precision neural networks
Energy awareness in low precision neural networks Open
Power consumption is a major obstacle in the deployment of deep neural networks (DNNs) on end devices. Existing approaches for reducing power consumption rely on quite general principles, including avoidance of multiplication operations an…
View article: Accurate Neural Training with 4-bit Matrix Multiplications at Standard Formats
Accurate Neural Training with 4-bit Matrix Multiplications at Standard Formats Open
Quantization of the weights and activations is one of the main methods to reduce the computational footprint of Deep Neural Networks (DNNs) training. Current methods enable 4-bit quantization of the forward phase. However, this constitutes…
View article: Task-Agnostic Continual Learning Using Online Variational Bayes with Fixed-Point Updates
Task-Agnostic Continual Learning Using Online Variational Bayes with Fixed-Point Updates Open
Catastrophic forgetting is the notorious vulnerability of neural networks to the changes in the data distribution during learning. This phenomenon has long been considered a major obstacle for using learning agents in realistic continual l…
View article: Neural gradients are lognormally distributed: understanding sparse and quantized training.
Neural gradients are lognormally distributed: understanding sparse and quantized training. Open
Neural gradient compression remains a main bottleneck in improving training efficiency, as most existing neural network compression methods (e.g., pruning or quantization) focus on weights, activations, and weight gradients. However, these…
View article: Neural gradients are near-lognormal: improved quantized and sparse training
Neural gradients are near-lognormal: improved quantized and sparse training Open
While training can mostly be accelerated by reducing the time needed to propagate neural gradients back throughout the model, most previous works focus on the quantization/pruning of weights and activations. These methods are often not app…
View article: Augment Your Batch: Improving Generalization Through Instance Repetition
Augment Your Batch: Improving Generalization Through Instance Repetition Open
Large-batch SGD is important for scaling training of deep neural networks. However, without fine-tuning hyperparameter schedules, the generalization of the model may be hampered. We propose to use batch augmentation: replicating instances …
View article: The Knowledge Within: Methods for Data-Free Model Compression
The Knowledge Within: Methods for Data-Free Model Compression Open
Recently, an extensive amount of research has been focused on compressing and accelerating Deep Neural Networks (DNN). So far, high compression rate algorithms require part of the training dataset for a low precision calibration, or a fine…
View article: At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks?
At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks? Open
Background: Recent developments have made it possible to accelerate neural networks training significantly using large batch sizes and data parallelism. Training in an asynchronous fashion, where delay occurs, can make training even more s…
View article: At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks?
At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks? Open
Background: Recent developments have made it possible to accelerate neural networks training significantly using large batch sizes and data parallelism. Training in an asynchronous fashion, where delay occurs, can make training even more s…
View article: Mix & Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency
Mix & Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency Open
Convolutional neural networks (CNNs) are commonly trained using a fixed spatial image size predetermined for a given model. Although trained on images of aspecific size, it is well established that CNNs can be used to evaluate a wide range…
View article: Augment your batch: better training with larger batches
Augment your batch: better training with larger batches Open
Large-batch SGD is important for scaling training of deep neural networks. However, without fine-tuning hyperparameter schedules, the generalization of the model may be hampered. We propose to use batch augmentation: replicating instances …
View article: Post-training 4-bit quantization of convolution networks for rapid-deployment
Post-training 4-bit quantization of convolution networks for rapid-deployment Open
Convolutional neural networks require significant memory bandwidth and storage for intermediate computations, apart from substantial computing resources. Neural network quantization has significant benefits in reducing the amount of interm…
View article: ACIQ: Analytical Clipping for Integer Quantization of neural networks
ACIQ: Analytical Clipping for Integer Quantization of neural networks Open
Unlike traditional approaches that focus on the quantization at the network level, in this work we propose to minimize the quantization effect at the tensor level. We analyze the trade-off between quantization noise and clipping distortion…
View article: Scalable methods for 8-bit training of neural networks
Scalable methods for 8-bit training of neural networks Open
Quantized Neural Networks (QNNs) are often used to improve network efficiency during the inference phase, i.e. after the network has been trained. Extensive research in the field suggests many different quantization schemes. Still, the num…
View article: Scalable Methods for 8-bit Training of Neural Networks
Scalable Methods for 8-bit Training of Neural Networks Open
Quantized Neural Networks (QNNs) are often used to improve network efficiency during the inference phase, i.e. after the network has been trained. Extensive research in the field suggests many different quantization schemes. Still, the num…
View article: Task Agnostic Continual Learning Using Online Variational Bayes
Task Agnostic Continual Learning Using Online Variational Bayes Open
Catastrophic forgetting is the notorious vulnerability of neural networks to the change of the data distribution while learning. This phenomenon has long been considered a major obstacle for allowing the use of learning agents in realistic…
View article: Bayesian Gradient Descent: Online Variational Bayes Learning with Increased Robustness to Catastrophic Forgetting and Weight Pruning.
Bayesian Gradient Descent: Online Variational Bayes Learning with Increased Robustness to Catastrophic Forgetting and Weight Pruning. Open
We suggest a novel approach for the estimation of the posterior distribution
of the weights of a neural network, using an online version of the variational
Bayes method. Having a confidence measure of the weights allows to combat
several s…
View article: Norm matters: efficient and accurate normalization schemes in deep networks
Norm matters: efficient and accurate normalization schemes in deep networks Open
Over the past few years, Batch-Normalization has been commonly used in deep networks, allowing faster training and high performance for a wide variety of applications. However, the reasons behind its merits remained unanswered, with severa…
View article: On the Blindspots of Convolutional Networks
On the Blindspots of Convolutional Networks Open
Deep convolutional network has been the state-of-the-art approach for a wide variety of tasks over the last few years. Its successes have, in many cases, turned it into the default model in quite a few domains. In this work, we will demons…
View article: Fix your classifier: the marginal value of training the last weight layer
Fix your classifier: the marginal value of training the last weight layer Open
Neural networks are commonly used as models for classification for a wide variety of tasks. Typically, a learned affine transformation is placed at the end of such models, yielding a per-class value used for classification. This classifier…
View article: The Implicit Bias of Gradient Descent on Separable Data
The Implicit Bias of Gradient Descent on Separable Data Open
We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. Th…
View article: Train longer, generalize better: closing the generalization gap in large batch training of neural networks
Train longer, generalize better: closing the generalization gap in large batch training of neural networks Open
Background: Deep learning models are typically trained using stochastic gradient descent or one of its variants. These methods update the weights using their gradient, estimated from a small fraction of the training data. It has been obser…
View article: Exponentially vanishing sub-optimal local minima in multilayer neural networks
Exponentially vanishing sub-optimal local minima in multilayer neural networks Open
Background: Statistical mechanics results (Dauphin et al. (2014); Choromanska et al. (2015)) suggest that local minima with high error are exponentially rare in high dimensions. However, to prove low error guarantees for Multilayer Neural …
View article: Semi-supervised deep learning by metric embedding
Semi-supervised deep learning by metric embedding Open
Deep networks are successfully used as classification models yielding state-of-the-art results when trained on a large number of labeled samples. These models, however, are usually much less suited for semi-supervised problems because of t…
View article: Spatial contrasting for deep unsupervised learning
Spatial contrasting for deep unsupervised learning Open
Convolutional networks have marked their place over the last few years as the best performing model for various visual tasks. They are, however, most suited for supervised learning from large amounts of labeled data. Previous attempts have…
View article: Semi-supervised deep learning by metric embedding
Semi-supervised deep learning by metric embedding Open
Deep networks are successfully used as classification models yielding state-of-the-art results when trained on a large number of labeled samples. These models, however, are usually much less suited for semi-supervised problems because of t…
View article: Deep unsupervised learning through spatial contrasting
Deep unsupervised learning through spatial contrasting Open
Convolutional networks have marked their place over the last few years as the best performing model for various visual tasks. They are, however, most suited for supervised learning from large amounts of labeled data. Previous attempts have…