Giancarlo Kerg
YOU?
Author Swipe
View article: Neural networks with optimized single-neuron adaptation uncover biologically plausible regularization
Neural networks with optimized single-neuron adaptation uncover biologically plausible regularization Open
Neurons in the brain have rich and adaptive input-output properties. Features such as heterogeneous f-I curves and spike frequency adaptation are known to place single neurons in optimal coding regimes when facing changing stimuli. Yet, it…
View article: On Neural Architecture Inductive Biases for Relational Tasks
On Neural Architecture Inductive Biases for Relational Tasks Open
Current deep learning approaches have shown good in-distribution generalization performance, but struggle with out-of-distribution generalization. This is especially true in the case of tasks involving abstract relations like recognizing r…
View article: Neural networks with optimized single-neuron adaptation uncover biologically plausible regularization
Neural networks with optimized single-neuron adaptation uncover biologically plausible regularization Open
Neurons in the brain have rich and adaptive input-output properties. Features such as heterogeneous f-I curves and spike frequency adaptation are known to place single neurons in optimal coding regimes when facing changing stimuli. Yet, it…
View article: Continuous-Time Meta-Learning with Forward Mode Differentiation
Continuous-Time Meta-Learning with Forward Mode Differentiation Open
Drawing inspiration from gradient-based meta-learning methods with infinitely small gradient steps, we introduce Continuous-Time Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector fi…
View article: Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization
Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization Open
The early phase of training a deep neural network has a dramatic effect on the local curvature of the loss function. For instance, using a small learning rate does not guarantee stable optimization because the optimization trajectory has a…
View article: Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts\n Generalization
Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts\n Generalization Open
The early phase of training a deep neural network has a dramatic effect on\nthe local curvature of the loss function. For instance, using a small learning\nrate does not guarantee stable optimization because the optimization trajectory\nha…
View article: Advantages of biologically-inspired adaptive neural activation in RNNs during learning
Advantages of biologically-inspired adaptive neural activation in RNNs during learning Open
Dynamic adaptation in single-neuron response plays a fundamental role in neural coding in biological neural networks. Yet, most neural activation functions used in artificial networks are fixed and mostly considered as an inconsequential a…
View article: Untangling tradeoffs between recurrence and self-attention in neural networks
Untangling tradeoffs between recurrence and self-attention in neural networks Open
Attention and self-attention mechanisms, are now central to state-of-the-art deep learning on sequential tasks. However, most recent progress hinges on heuristic approaches with limited understanding of attention's role in model optimizati…
View article: Non-normal Recurrent Neural Network (nnRNN): learning long time\n dependencies while improving expressivity with transient dynamics
Non-normal Recurrent Neural Network (nnRNN): learning long time\n dependencies while improving expressivity with transient dynamics Open
A recent strategy to circumvent the exploding and vanishing gradient problem\nin RNNs, and to allow the stable propagation of signals over long time scales,\nis to constrain recurrent connectivity matrices to be orthogonal or unitary.\nThi…
View article: Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics
Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics Open
A recent strategy to circumvent the exploding and vanishing gradient problem in RNNs, and to allow the stable propagation of signals over long time scales, is to constrain recurrent connectivity matrices to be orthogonal or unitary. This e…
View article: h-detach: Modifying the LSTM Gradient Towards Better Optimization
h-detach: Modifying the LSTM Gradient Towards Better Optimization Open
Recurrent neural networks are known for their notorious exploding and vanishing gradient problem (EVGP). This problem becomes more evident in tasks where the information needed to correctly solve them exist over long time scales, because E…