MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Convergence Results for Some Temporal Difference Methods Based on Least Squares

Author(s)
Yu, Huizhen; Bertsekas, Dimitri P.
Thumbnail
DownloadYu-2009-Convergence Results for Some Temporal Difference Methods Based on Least Squares.pdf (1.316Mb)
PUBLISHER_POLICY

Publisher Policy

Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.

Terms of use
Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.
Metadata
Show full item record
Abstract
We consider finite-state Markov decision processes, and prove convergence and rate of convergence results for certain least squares policy evaluation algorithms of the type known as LSPE(lambda ). These are temporal difference methods for constructing a linear function approximation of the cost function of a stationary policy, within the context of infinite-horizon discounted and average cost dynamic programming. We introduce an average cost method, patterned after the known discounted cost method, and we prove its convergence for a range of constant stepsize choices. We also show that the convergence rate of both the discounted and the average cost methods is optimal within the class of temporal difference methods. Analysis and experiment indicate that our methods are substantially and often dramatically faster than TD(lambda), as well as more reliable.
Date issued
2009-07
URI
http://hdl.handle.net/1721.1/74102
Department
Massachusetts Institute of Technology. Laboratory for Information and Decision Systems
Journal
IEEE Transactions on Automatic Control
Publisher
Institute of Electrical and Electronics Engineers
Citation
Huizhen Yu, and D.P. Bertsekas. “Convergence Results for Some Temporal Difference Methods Based on Least Squares.” IEEE Transactions on Automatic Control 54.7 (2009): 1515–1531. Web.©2009 IEEE.
Version: Final published version
Other identifiers
INSPEC Accession Number: 10774680
ISSN
0018-9286

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.