Show simple item record

dc.contributor.authorYu, Huizhen
dc.contributor.authorBertsekas, Dimitri P.
dc.date.accessioned2012-10-18T19:03:35Z
dc.date.available2012-10-18T19:03:35Z
dc.date.issued2009-07
dc.date.submitted2008-08
dc.identifier.issn0018-9286
dc.identifier.otherINSPEC Accession Number: 10774680
dc.identifier.urihttp://hdl.handle.net/1721.1/74102
dc.description.abstractWe consider finite-state Markov decision processes, and prove convergence and rate of convergence results for certain least squares policy evaluation algorithms of the type known as LSPE(lambda ). These are temporal difference methods for constructing a linear function approximation of the cost function of a stationary policy, within the context of infinite-horizon discounted and average cost dynamic programming. We introduce an average cost method, patterned after the known discounted cost method, and we prove its convergence for a range of constant stepsize choices. We also show that the convergence rate of both the discounted and the average cost methods is optimal within the class of temporal difference methods. Analysis and experiment indicate that our methods are substantially and often dramatically faster than TD(lambda), as well as more reliable.en_US
dc.language.isoen_US
dc.publisherInstitute of Electrical and Electronics Engineersen_US
dc.relation.isversionofhttp://dx.doi.org/10.1109/tac.2009.2022097en_US
dc.rightsArticle is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.en_US
dc.sourceIEEEen_US
dc.titleConvergence Results for Some Temporal Difference Methods Based on Least Squaresen_US
dc.typeArticleen_US
dc.identifier.citationHuizhen Yu, and D.P. Bertsekas. “Convergence Results for Some Temporal Difference Methods Based on Least Squares.” IEEE Transactions on Automatic Control 54.7 (2009): 1515–1531. Web.©2009 IEEE.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Laboratory for Information and Decision Systemsen_US
dc.contributor.approverBertsekas, Dimitri P.
dc.contributor.mitauthorBertsekas, Dimitri P.
dc.contributor.mitauthorYu, Huizhen
dc.relation.journalIEEE Transactions on Automatic Controlen_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dspace.orderedauthorsHuizhen Yu; Bertsekas, D.P.en
dc.identifier.orcidhttps://orcid.org/0000-0001-6909-7208
mit.licensePUBLISHER_POLICYen_US
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record