Regularization (linguistics)

ImageNet classification with deep convolutional neural networks Open

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton · 2017

Computer science Sociology

We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5%…

Decoupled Weight Decay Regularization Open

Ilya Loshchilov, Frank Hutter · 2017

Physics Mathematics Computer science

L$_2$ regularization and weight decay regularization are equivalent for standard stochastic gradient descent (when rescaled by the learning rate), but as we demonstrate this is \emph{not} the case for adaptive gradient algorithms, such as …

Improved Regularization of Convolutional Neural Networks with Cutout Open

Terrance DeVries, Graham W. Taylor · 2017

Computer science

Convolutional neural networks are capable of learning powerful representational spaces, which are necessary for tackling complex learning tasks. However, due to the model capacity required to capture such representations, they are often su…

Large Scale GAN Training for High Fidelity Natural Image Synthesis Open

Andrew Brock, Jeff Donahue, Karen Simonyan · 2018

Computer science Mathematics Engineering

Despite recent progress in generative image modeling, successfully generating high-resolution, diverse samples from complex datasets such as ImageNet remains an elusive goal. To this end, we train Generative Adversarial Networks at the lar…

A tutorial on regularized partial correlation networks. Open

Sacha Epskamp, Eiko I. Fried · 2018

Computer science Psychology Mathematics

Recent years have seen an emergence of network modeling applied to moods, attitudes, and problems in the realm of psychology. In this framework, psychological variables are understood to directly affect each other rather than being caused …

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence Open

Kihyuk Sohn, David Berthelot, Chunliang Li, Zizhao Zhang, Nicholas Carlini , et al. · 2020

Computer science Philosophy

Semi-supervised learning (SSL) provides an effective means of leveraging unlabeled data to improve a model's performance. In this paper, we demonstrate the power of a simple combination of two common SSL methods: consistency regularization…

An Overview of Overfitting and its Solutions Open

Ying Xue · 2019

Computer science Philosophy

Overfitting is a fundamental issue in supervised machine learning which prevents us from perfectly generalizing the models to well fit observed data on training data, as well as unseen data on testing set. Because of the presence of noise,…

Understanding deep learning (still) requires rethinking generalization Open

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals · 2021

Computer science Mathematics

Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small gap between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model fa…

Double/debiased machine learning for treatment and structural parameters Open

Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen , et al. · 2017

Mathematics Computer science

We revisit the classic semi‐parametric problem of inference on a low‐dimensional parameter θ0 in the presence of high‐dimensional nuisance parameters η0. We depart from the classical setting by allowing for η0 to be so high‐dimensional tha…

Large Scale GAN Training for High Fidelity Natural Image Synthesis Open

Andrew Brock, Jeff Donahue, Karen Simonyan · 2018

Computer science Engineering Chemistry

Despite recent progress in generative image modeling, successfully generating high-resolution, diverse samples from complex datasets such as ImageNet remains an elusive goal. To this end, we train Generative Adversarial Networks at the lar…

Temporal Ensembling for Semi-Supervised Learning Open

Samuli Laine, Timo Aila · 2016

Computer science

In this paper, we present a simple and efficient method for training deep neural networks in a semi-supervised setting where only a small portion of training data is labeled. We introduce self-ensembling, where we form a consensus predicti…

A Structured Self-attentive Sentence Embedding Open

Zhouhan Lin, Minwei Feng, Cícero Nogueira dos Santos, Mo Yu, Bing Xiang , et al. · 2017

Computer science

This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, we use a 2-D matrix to represent the embedding, with each row of the matrix attending on a differe…

MLP-Mixer: An all-MLP Architecture for Vision Open

Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai , et al. · 2021

Computer science Engineering Art

Convolutional Neural Networks (CNNs) are the go-to model for computer vision. Recently, attention-based networks, such as the Vision Transformer, have also become popular. In this paper we show that while convolutions and attention are bot…

EfficientNetV2: Smaller Models and Faster Training Open

Mingxing Tan, Quoc V. Le · 2021

Computer science Physics

This paper introduces EfficientNetV2, a new family of convolutional networks that have faster training speed and better parameter efficiency than previous models. To develop this family of models, we use a combination of training-aware neu…

Understanding deep learning requires rethinking generalization Open

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals · 2016

Computer science Mathematics Philosophy

Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance. Conventional wisdom attributes small generalization error either to properties of the m…

Multi-Task Deep Neural Networks for Natural Language Understanding Open

Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao · 2019

Computer science Geography Economics

In this paper, we present a Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks. MT-DNN not only leverages large amounts of cross-task data, but also benefits from…

Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks Open

Wentao Zhu, Cuiling Lan, Junliang Xing, Wenjun Zeng, Yanghao Li , et al. · 2016

Computer science Mathematics Political science

Skeleton based action recognition distinguishes human actions using the trajectories of skeleton joints, which provide a very good representation for describing actions. Considering that recurrent neural networks (RNNs) with Long Short-Ter…

A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay Open

Leslie N. Smith · 2018

Computer science Mathematics Biology

Although deep learning has produced dazzling successes for applications of image, speech, and video processing in the past few years, most trainings are with suboptimal hyper-parameters, requiring unnecessarily long training times. Setting…

Iterative Bregman Projections for Regularized Transportation Problems Open

Jean‐David Benamou, Guillaume Carlier, Marco Cuturi, Luca Nenna, Gabriel Peyré · 2015

Mathematics Computer science Engineering

This article details a general numerical framework to approximate so-lutions\nto linear programs related to optimal transport. The general idea is to\nintroduce an entropic regularization of the initial linear program. This\nregularized pr…

A closer look at memorization in deep networks Open

Devansh Arpit, Stanisław Jastrzȩbski, Nicolas Ballas, David Krueger, Emmanuel Bengio , et al. · 2017

Computer science Mathematics Psychology

We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While deep networks are capable of memorizing noise data, our results suggest that they tend to prioritize l…

Hyperspectral and Multispectral Image Fusion Based on a Sparse Representation Open

Qi Wei, José M. Bioucas‐Dias, Nicolas Dobigeon, Jean‐Yves Tourneret · 2015

Computer science Mathematics Philosophy

This paper presents a variational based approach to fusing hyperspectral and\nmultispectral images. The fusion process is formulated as an inverse problem\nwhose solution is the target image assumed to live in a much lower dimensional\nsub…

CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features Open

Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe , et al. · 2019

Computer science Chemistry

Regional dropout strategies have been proposed to enhance the performance of convolutional neural network classifiers. They have proved to be effective for guiding the model to attend on less discriminative parts of objects (e.g. leg as op…

Learning Transferable Architectures for Scalable Image Recognition Open

Barret Zoph, Vijay Vasudevan, Jonathon Shlens, Quoc V. Le · 2018

Computer science Mathematics Art

Developing neural network image classification models often requires significant architecture engineering. In this paper, we study a method to learn the model architectures directly on the dataset of interest. As this approach is expensive…

Which Training Methods for GANs do actually Converge? Open

Lars Mescheder, Andreas Geiger, Sebastian Nowozin · 2018

Computer science Mathematics Physics

Recent work has shown local convergence of GAN training for absolutely continuous data and generator distributions. In this paper, we show that the requirement of absolute continuity is necessary: we describe a simple yet prototypical coun…

Learning to Reweight Examples for Robust Deep Learning Open

Mengye Ren, Wenyuan Zeng, Bin Yang, Raquel Urtasun · 2018

Computer science

Deep neural networks have been shown to be very powerful modeling tools for many supervised learning tasks involving complex input patterns. However, they can also easily overfit to training set biases and label noises. In addition to vari…

Rethinking the Inception Architecture for Computer Vision Open

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna · 2015

Computer science Business Geography

Convolutional networks are at the core of most state-of-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmar…

Semi-supervised Medical Image Segmentation through Dual-task Consistency Open

Xiangde Luo, Jieneng Chen, Tao Song, Guotai Wang · 2021

Computer science

Deep learning-based semi-supervised learning (SSL) algorithms have led to promising results in medical images segmentation and can alleviate doctors' expensive annotations by leveraging unlabeled data. However, most of the existing SSL alg…

Three scenarios for continual learning Open

Gido M. van de Ven, Andreas S. Tolias · 2019

Computer science Engineering Psychology

Standard artificial neural networks suffer from the well-known issue of catastrophic forgetting, making continual or lifelong learning difficult for machine learning. In recent years, numerous methods have been proposed for continual learn…

Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning Open

Mehdi S. M. Sajjadi, Mehran Javanmardi, Tolga Taşdizen · 2016

Computer science

Effective convolutional neural networks are trained on large sets of labeled data. However, creating large labeled datasets is a very costly and time-consuming task. Semi-supervised learning uses unlabeled data to train a model with higher…

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks Open

Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, Richard Socher · 2017

Computer science Economics

Transfer and multi-task learning have traditionally focused on either a single source-target pair or very few, similar tasks. Ideally, the linguistic levels of morphology, syntax and semantics would benefit each other by being trained in a…