Activation function of a node in an artificial neural network is a
function that calculates the output of the node (based on its inputs and the
weights on individual inputs). Nontrivial problems can be solved only using a
nonlinear activation function. Modern activation functions include the
smooth version of the ReLU, the GELU, which was used in the 2018 BERT model,
the logistic (sigmoid) function used in the 2012 speech recognition model
developed by Hinton et al, the ReLU used in the 2012 AlexNet computer vision
model and in the 2015 ResNet model.
…