ICASSP 2020: 45th International Conference on Acoustics, Speech, and Signal Processing. Virtual Barcelona, Spain, May 4-8, 2020.
Jianyu Wang*^, Vinayak Tantiay^, Nicolas Ballasy^, Michael Rabbaty^
*Carnegie Mellon University
^Facebook AI Research
The Lookahead optimizer [Zhang et al., 2019] was recently proposed and demonstrated to improve performance of stochastic firstorder methods for training deep neural networks. Lookahead can be viewed as a two time-scale algorithm, where the fast dynamics (inner optimizer) determine a search direction and the slow dynamics (outer optimizer) perform updates by moving along this direction. We prove that, with appropriate choice of step-sizes, Lookahead converges to a stationary point of smooth non-convex functions. Although Lookahead is described and implemented as a serial algorithm, our analysis is based on viewing Lookahead as a multi-agent optimization method with two agents communicating periodically.
FULL PAPER: pdf