arXiv (Cornell University)
May 2023 • Felix Benning, Leif Döring
Classical worst-case optimization theory neither explains the success of optimization in machine learning, nor does it help with step size selection. We establish a connection between Bayesian Optimization (i.e. average case optimization theory) and classical optimization using a 'stochastic Taylor approximation' to rediscover gradient descent. This rediscovery yields a step size schedule we call Random Function Descent (RFD), which, in contrast to classical derivations, is scale invariant. Furthermore, our analys…