Show simple item record

dc.contributor.authorBertsekas, Dimitri P.
dc.date.accessioned2011-06-21T19:35:53Z
dc.date.available2011-06-21T19:35:53Z
dc.date.issued2010-12
dc.identifier.issn0743-1546
dc.identifier.issn0191-2216
dc.identifier.urihttp://hdl.handle.net/1721.1/64641
dc.description.abstractApproximate policy iteration methods based on temporal differences are popular in practice, and have been tested extensively, dating to the early nineties, but the associated convergence behavior is complex, and not well understood at present. An important question is whether the policy iteration process is seriously hampered by oscillations between poor policies, roughly similar to the attraction of gradient methods to poor local minima. There has been little apparent concern in the approximate DP/reinforcement learning literature about this possibility, even though it has been documented with several simple examples. Recent computational experimentation with the game of tetris, a popular testbed for approximate DP algorithms over a 15-year period, has brought the issue to sharp focus. In particular, using a standard set of 22 features and temporal difference methods, an average score of a few thousands was achieved. Using the same features and a random search method, an overwhelmingly better average score was achieved (600,000-900,000). The paper explains the likely mechanism of this phenomenon, and derives conditions under which it will not occur.en_US
dc.description.sponsorshipNational Science Foundation (U.S.) (NSF Grant ECCS-0801549)en_US
dc.description.sponsorshipUnited States. Air Force (Grant FA9550-10-1-0412)en_US
dc.description.sponsorshipLos Alamos National Laboratory. Information Science and Technology Instituteen_US
dc.language.isoen_US
dc.publisherInstitute of Electrical and Electronics Engineersen_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alike 3.0en_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/3.0/en_US
dc.sourceMIT web domainen_US
dc.titlePathologies of Temporal Difference Methods in Approximate Dynamic Programmingen_US
dc.typeArticleen_US
dc.identifier.citationBertsekas, Dimitri P. "Pathologies of Temporal Difference Methods in Approximate Dynamic Programming." In Proceedings of the 49th IEEE Conference on Decision and Control, Dec.15-17, 2010, Hilton Atlanta Hotel, Atlanta, Georgia USA.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.contributor.approverBertsekas, Dimitri P.
dc.contributor.mitauthorBertsekas, Dimitri P.
dc.relation.journalProceedings of the 49th IEEE Conference on Decision and Control (CDC), 2010en_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
dspace.orderedauthorsBertsekas, Dimitri P.
dc.identifier.orcidhttps://orcid.org/0000-0001-6909-7208
mit.licenseOPEN_ACCESS_POLICYen_US
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record