Convergence Results for Some Temporal Difference Methods Based on Least Squares

Huizhen Yu; Bertsekas, D.P.

dc.contributor.author	Yu, Huizhen
dc.contributor.author	Bertsekas, Dimitri P.
dc.date.accessioned	2012-10-18T19:03:35Z
dc.date.available	2012-10-18T19:03:35Z
dc.date.issued	2009-07
dc.date.submitted	2008-08
dc.identifier.issn	0018-9286
dc.identifier.other	INSPEC Accession Number: 10774680
dc.identifier.uri	http://hdl.handle.net/1721.1/74102
dc.description.abstract	We consider finite-state Markov decision processes, and prove convergence and rate of convergence results for certain least squares policy evaluation algorithms of the type known as LSPE(lambda ). These are temporal difference methods for constructing a linear function approximation of the cost function of a stationary policy, within the context of infinite-horizon discounted and average cost dynamic programming. We introduce an average cost method, patterned after the known discounted cost method, and we prove its convergence for a range of constant stepsize choices. We also show that the convergence rate of both the discounted and the average cost methods is optimal within the class of temporal difference methods. Analysis and experiment indicate that our methods are substantially and often dramatically faster than TD(lambda), as well as more reliable.	en_US
dc.language.iso	en_US
dc.publisher	Institute of Electrical and Electronics Engineers	en_US
dc.relation.isversionof	http://dx.doi.org/10.1109/tac.2009.2022097	en_US
dc.rights	Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.	en_US
dc.source	IEEE	en_US
dc.title	Convergence Results for Some Temporal Difference Methods Based on Least Squares	en_US
dc.type	Article	en_US
dc.identifier.citation	Huizhen Yu, and D.P. Bertsekas. “Convergence Results for Some Temporal Difference Methods Based on Least Squares.” IEEE Transactions on Automatic Control 54.7 (2009): 1515–1531. Web.©2009 IEEE.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Laboratory for Information and Decision Systems	en_US
dc.contributor.approver	Bertsekas, Dimitri P.
dc.contributor.mitauthor	Bertsekas, Dimitri P.
dc.contributor.mitauthor	Yu, Huizhen
dc.relation.journal	IEEE Transactions on Automatic Control	en_US
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dspace.orderedauthors	Huizhen Yu; Bertsekas, D.P.	en
dc.identifier.orcid	https://orcid.org/0000-0001-6909-7208
mit.license	PUBLISHER_POLICY	en_US
mit.metadata.status	Complete

Files in this item

Name:: Yu-2009-Convergence Results for ...
Size:: 1.316Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record