Regularization (linguistics)
View article: ImageNet classification with deep convolutional neural networks
ImageNet classification with deep convolutional neural networks Open
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5%…
View article
Decoupled Weight Decay Regularization Open
L$_2$ regularization and weight decay regularization are equivalent for standard stochastic gradient descent (when rescaled by the learning rate), but as we demonstrate this is \emph{not} the case for adaptive gradient algorithms, such as …
View article
Improved Regularization of Convolutional Neural Networks with Cutout Open
Convolutional neural networks are capable of learning powerful representational spaces, which are necessary for tackling complex learning tasks. However, due to the model capacity required to capture such representations, they are often su…
View article
Large Scale GAN Training for High Fidelity Natural Image Synthesis Open
Despite recent progress in generative image modeling, successfully generating high-resolution, diverse samples from complex datasets such as ImageNet remains an elusive goal. To this end, we train Generative Adversarial Networks at the lar…
View article
A tutorial on regularized partial correlation networks. Open
Recent years have seen an emergence of network modeling applied to moods, attitudes, and problems in the realm of psychology. In this framework, psychological variables are understood to directly affect each other rather than being caused …
View article
FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence Open
Semi-supervised learning (SSL) provides an effective means of leveraging unlabeled data to improve a model's performance. In this paper, we demonstrate the power of a simple combination of two common SSL methods: consistency regularization…
View article
An Overview of Overfitting and its Solutions Open
Overfitting is a fundamental issue in supervised machine learning which prevents us from perfectly generalizing the models to well fit observed data on training data, as well as unseen data on testing set. Because of the presence of noise,…
View article
Understanding deep learning (still) requires rethinking generalization Open
Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small gap between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model fa…
View article
Double/debiased machine learning for treatment and structural parameters Open
We revisit the classic semi‐parametric problem of inference on a low‐dimensional parameter θ0 in the presence of high‐dimensional nuisance parameters η0. We depart from the classical setting by allowing for η0 to be so high‐dimensional tha…
View article
Large Scale GAN Training for High Fidelity Natural Image Synthesis Open
Despite recent progress in generative image modeling, successfully generating high-resolution, diverse samples from complex datasets such as ImageNet remains an elusive goal. To this end, we train Generative Adversarial Networks at the lar…
View article
Temporal Ensembling for Semi-Supervised Learning Open
In this paper, we present a simple and efficient method for training deep neural networks in a semi-supervised setting where only a small portion of training data is labeled. We introduce self-ensembling, where we form a consensus predicti…
View article
A Structured Self-attentive Sentence Embedding Open
This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, we use a 2-D matrix to represent the embedding, with each row of the matrix attending on a differe…
View article
MLP-Mixer: An all-MLP Architecture for Vision Open
Convolutional Neural Networks (CNNs) are the go-to model for computer vision. Recently, attention-based networks, such as the Vision Transformer, have also become popular. In this paper we show that while convolutions and attention are bot…
View article
EfficientNetV2: Smaller Models and Faster Training Open
This paper introduces EfficientNetV2, a new family of convolutional networks that have faster training speed and better parameter efficiency than previous models. To develop this family of models, we use a combination of training-aware neu…
View article
Understanding deep learning requires rethinking generalization Open
Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance. Conventional wisdom attributes small generalization error either to properties of the m…
View article
Multi-Task Deep Neural Networks for Natural Language Understanding Open
In this paper, we present a Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks. MT-DNN not only leverages large amounts of cross-task data, but also benefits from…
View article
Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks Open
Skeleton based action recognition distinguishes human actions using the trajectories of skeleton joints, which provide a very good representation for describing actions. Considering that recurrent neural networks (RNNs) with Long Short-Ter…
View article
A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay Open
Although deep learning has produced dazzling successes for applications of image, speech, and video processing in the past few years, most trainings are with suboptimal hyper-parameters, requiring unnecessarily long training times. Setting…
View article
Iterative Bregman Projections for Regularized Transportation Problems Open
This article details a general numerical framework to approximate so-lutions\nto linear programs related to optimal transport. The general idea is to\nintroduce an entropic regularization of the initial linear program. This\nregularized pr…
View article
A closer look at memorization in deep networks Open
We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While deep networks are capable of memorizing noise data, our results suggest that they tend to prioritize l…
View article
Hyperspectral and Multispectral Image Fusion Based on a Sparse Representation Open
This paper presents a variational based approach to fusing hyperspectral and\nmultispectral images. The fusion process is formulated as an inverse problem\nwhose solution is the target image assumed to live in a much lower dimensional\nsub…
View article
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features Open
Regional dropout strategies have been proposed to enhance the performance of convolutional neural network classifiers. They have proved to be effective for guiding the model to attend on less discriminative parts of objects (e.g. leg as op…
View article
Learning Transferable Architectures for Scalable Image Recognition Open
Developing neural network image classification models often requires significant architecture engineering. In this paper, we study a method to learn the model architectures directly on the dataset of interest. As this approach is expensive…
View article
Which Training Methods for GANs do actually Converge? Open
Recent work has shown local convergence of GAN training for absolutely continuous data and generator distributions. In this paper, we show that the requirement of absolute continuity is necessary: we describe a simple yet prototypical coun…
View article
Learning to Reweight Examples for Robust Deep Learning Open
Deep neural networks have been shown to be very powerful modeling tools for many supervised learning tasks involving complex input patterns. However, they can also easily overfit to training set biases and label noises. In addition to vari…
View article
Rethinking the Inception Architecture for Computer Vision Open
Convolutional networks are at the core of most state-of-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmar…
View article
Semi-supervised Medical Image Segmentation through Dual-task Consistency Open
Deep learning-based semi-supervised learning (SSL) algorithms have led to promising results in medical images segmentation and can alleviate doctors' expensive annotations by leveraging unlabeled data. However, most of the existing SSL alg…
View article
Three scenarios for continual learning Open
Standard artificial neural networks suffer from the well-known issue of catastrophic forgetting, making continual or lifelong learning difficult for machine learning. In recent years, numerous methods have been proposed for continual learn…
View article
Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning Open
Effective convolutional neural networks are trained on large sets of labeled data. However, creating large labeled datasets is a very costly and time-consuming task. Semi-supervised learning uses unlabeled data to train a model with higher…
View article
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks Open
Transfer and multi-task learning have traditionally focused on either a single source-target pair or very few, similar tasks. Ideally, the linguistic levels of morphology, syntax and semantics would benefit each other by being trained in a…