On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

Yu, Huizhen; Bertsekas, Dimitri P.

dc.contributor.author	Yu, Huizhen
dc.contributor.author	Bertsekas, Dimitri P.
dc.date.accessioned	2015-02-03T19:18:36Z
dc.date.available	2015-02-03T19:18:36Z
dc.date.issued	2012-11
dc.date.submitted	2012-04
dc.identifier.issn	0364-765X
dc.identifier.issn	1526-5471
dc.identifier.uri	http://hdl.handle.net/1721.1/93744
dc.description.abstract	We consider a totally asynchronous stochastic approximation algorithm, Q-learning, for solving finite space stochastic shortest path (SSP) problems, which are undiscounted, total cost Markov decision processes with an absorbing and cost-free state. For the most commonly used SSP models, existing convergence proofs assume that the sequence of Q-learning iterates is bounded with probability one, or some other condition that guarantees boundedness. We prove that the sequence of iterates is naturally bounded with probability one, thus furnishing the boundedness condition in the convergence proof by Tsitsiklis [Tsitsiklis JN (1994) Asynchronous stochastic approximation and Q-learning. Machine Learn. 16:185–202] and establishing completely the convergence of Q-learning for these SSP models.	en_US
dc.description.sponsorship	United States. Air Force (Grant FA9550-10-1-0412)	en_US
dc.description.sponsorship	National Science Foundation (U.S.) (Grant ECCS-0801549)	en_US
dc.language.iso	en_US
dc.publisher	Institute for Operations Research and the Management Sciences (INFORMS)	en_US
dc.relation.isversionof	http://dx.doi.org/10.1287/moor.1120.0562	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	Prof. Bertsekas via Chris Sherratt	en_US
dc.title	On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems	en_US
dc.type	Article	en_US
dc.identifier.citation	Yu, Huizhen, and Dimitri P. Bertsekas. “On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems.” Mathematics of Operations Research 38, no. 2 (May 2013): 209–227.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.contributor.department	Massachusetts Institute of Technology. Laboratory for Information and Decision Systems	en_US
dc.contributor.approver	Bertsekas, Dimitri P.	en_US
dc.contributor.mitauthor	Yu, Huizhen	en_US
dc.contributor.mitauthor	Bertsekas, Dimitri P.	en_US
dc.relation.journal	Mathematics of Operations Research	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dspace.orderedauthors	Yu, Huizhen; Bertsekas, Dimitri P.	en_US
dc.identifier.orcid	https://orcid.org/0000-0001-6909-7208
mit.license	OPEN_ACCESS_POLICY	en_US
mit.metadata.status	Complete

Files in this item

Name:: YU boundedness.pdf
Size:: 423.5Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record