Pathologies of Temporal Difference Methods in Approximate Dynamic Programming

Bertsekas, Dimitri P.

dc.contributor.author	Bertsekas, Dimitri P.
dc.date.accessioned	2011-06-21T19:35:53Z
dc.date.available	2011-06-21T19:35:53Z
dc.date.issued	2010-12
dc.identifier.issn	0743-1546
dc.identifier.issn	0191-2216
dc.identifier.uri	http://hdl.handle.net/1721.1/64641
dc.description.abstract	Approximate policy iteration methods based on temporal differences are popular in practice, and have been tested extensively, dating to the early nineties, but the associated convergence behavior is complex, and not well understood at present. An important question is whether the policy iteration process is seriously hampered by oscillations between poor policies, roughly similar to the attraction of gradient methods to poor local minima. There has been little apparent concern in the approximate DP/reinforcement learning literature about this possibility, even though it has been documented with several simple examples. Recent computational experimentation with the game of tetris, a popular testbed for approximate DP algorithms over a 15-year period, has brought the issue to sharp focus. In particular, using a standard set of 22 features and temporal difference methods, an average score of a few thousands was achieved. Using the same features and a random search method, an overwhelmingly better average score was achieved (600,000-900,000). The paper explains the likely mechanism of this phenomenon, and derives conditions under which it will not occur.	en_US
dc.description.sponsorship	National Science Foundation (U.S.) (NSF Grant ECCS-0801549)	en_US
dc.description.sponsorship	United States. Air Force (Grant FA9550-10-1-0412)	en_US
dc.description.sponsorship	Los Alamos National Laboratory. Information Science and Technology Institute	en_US
dc.language.iso	en_US
dc.publisher	Institute of Electrical and Electronics Engineers	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike 3.0	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/3.0/	en_US
dc.source	MIT web domain	en_US
dc.title	Pathologies of Temporal Difference Methods in Approximate Dynamic Programming	en_US
dc.type	Article	en_US
dc.identifier.citation	Bertsekas, Dimitri P. "Pathologies of Temporal Difference Methods in Approximate Dynamic Programming." In Proceedings of the 49th IEEE Conference on Decision and Control, Dec.15-17, 2010, Hilton Atlanta Hotel, Atlanta, Georgia USA.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.contributor.approver	Bertsekas, Dimitri P.
dc.contributor.mitauthor	Bertsekas, Dimitri P.
dc.relation.journal	Proceedings of the 49th IEEE Conference on Decision and Control (CDC), 2010	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
dspace.orderedauthors	Bertsekas, Dimitri P.
dc.identifier.orcid	https://orcid.org/0000-0001-6909-7208
mit.license	OPEN_ACCESS_POLICY	en_US
mit.metadata.status	Complete

Files in this item

Name:: Bertsekas_Pathologies of.pdf
Size:: 1.019Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record