arXiv (Cornell University)
Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation
October 2022 • Gandharv Patil, L. A. Prashanth, Anant Raj, Doina Precup
We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm when combined with tail-averaging. We derive finite time bounds on the parameter error of the tail-averaged TD iterate under a step-size choice that does not require information about the eigenvalues of the matrix underlying the projected TD fixed point. Our analysis shows that tail-averaged TD converges at the optimal $O\left(1/t\right)$ rate, both in expectation and with high probability. In addition, our bounds exhibit…