arXiv (Cornell University)
Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization
June 2015 • Xiangru Lian, Yijun Huang, Yuncheng Li, Ji Liu
Asynchronous parallel implementations of stochastic gradient (SG) have been broadly used in solving deep neural network and received many successes in practice recently. However, existing theories cannot explain their convergence and speedup properties, mainly due to the nonconvexity of most deep learning formulations and the asynchronous parallel mechanism. To fill the gaps in theory and provide theoretical supports, this paper studies two asynchronous parallel implementations of SG: one is on the computer networ…