Symmetric Rank-One Quasi-Newton Methods for Deep Learning Using Cubic Regularization

Exploring foci of: arXiv (Cornell University) Symmetric Rank-One Quasi-Newton Methods for Deep Learning Using Cubic Regularization February 2025 • Aditya Ranganath, Mukesh Singhal, Roummel F. Marcia Stochastic gradient descent and other first-order variants, such as Adam and AdaGrad, are commonly used in the field of deep learning due to their computational efficiency and low-storage memory requirements. However, these methods do not exploit curvature information. Consequently, iterates can converge to saddle points or poor local minima. On the other hand, Quasi-Newton methods compute Hessian approximations which exploit this information with a comparable computational budget. Quasi-Newton methods re-use prev… Open Article Page

Mathematics Combinatorics Computer Science Artificial Intelligence Open Article