Convergence Results for Some Temporal Difference Methods Based on Least Squares

Huizhen Yu; Bertsekas, D.P.

Author(s)

Yu, Huizhen; Bertsekas, Dimitri P.

DownloadYu-2009-Convergence Results for Some Temporal Difference Methods Based on Least Squares.pdf (1.316Mb)

PUBLISHER_POLICY

Terms of use

Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.

Metadata

Show full item record

Abstract

We consider finite-state Markov decision processes, and prove convergence and rate of convergence results for certain least squares policy evaluation algorithms of the type known as LSPE(lambda ). These are temporal difference methods for constructing a linear function approximation of the cost function of a stationary policy, within the context of infinite-horizon discounted and average cost dynamic programming. We introduce an average cost method, patterned after the known discounted cost method, and we prove its convergence for a range of constant stepsize choices. We also show that the convergence rate of both the discounted and the average cost methods is optimal within the class of temporal difference methods. Analysis and experiment indicate that our methods are substantially and often dramatically faster than TD(lambda), as well as more reliable.

Date issued

2009-07

URI

http://hdl.handle.net/1721.1/74102

Department

Massachusetts Institute of Technology. Laboratory for Information and Decision Systems

Journal

IEEE Transactions on Automatic Control

Publisher

Institute of Electrical and Electronics Engineers

Citation

Version: Final published version

Other identifiers

INSPEC Accession Number: 10774680

ISSN

0018-9286

Collections

MIT Open Access Articles

DSpace@MIT