On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
Author(s)
Jaakkola, Tommi; Jordan, Michael I.; Singh, Satinder P.
DownloadAIM-1441.ps.Z (75.78Kb)
Additional downloads
Metadata
Show full item recordAbstract
Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong.
Date issued
1993-08-01Other identifiers
AIM-1441
CBCL-084
Series/Report no.
AIM-1441CBCL-084
Keywords
reinforcement learning, stochastic approximation, sconvergence, dynamic programming