Show simple item record

dc.contributor.authorYu, Huizhen
dc.contributor.authorBertsekas, Dimitri P.
dc.date.accessioned2015-02-03T19:18:36Z
dc.date.available2015-02-03T19:18:36Z
dc.date.issued2012-11
dc.date.submitted2012-04
dc.identifier.issn0364-765X
dc.identifier.issn1526-5471
dc.identifier.urihttp://hdl.handle.net/1721.1/93744
dc.description.abstractWe consider a totally asynchronous stochastic approximation algorithm, Q-learning, for solving finite space stochastic shortest path (SSP) problems, which are undiscounted, total cost Markov decision processes with an absorbing and cost-free state. For the most commonly used SSP models, existing convergence proofs assume that the sequence of Q-learning iterates is bounded with probability one, or some other condition that guarantees boundedness. We prove that the sequence of iterates is naturally bounded with probability one, thus furnishing the boundedness condition in the convergence proof by Tsitsiklis [Tsitsiklis JN (1994) Asynchronous stochastic approximation and Q-learning. Machine Learn. 16:185–202] and establishing completely the convergence of Q-learning for these SSP models.en_US
dc.description.sponsorshipUnited States. Air Force (Grant FA9550-10-1-0412)en_US
dc.description.sponsorshipNational Science Foundation (U.S.) (Grant ECCS-0801549)en_US
dc.language.isoen_US
dc.publisherInstitute for Operations Research and the Management Sciences (INFORMS)en_US
dc.relation.isversionofhttp://dx.doi.org/10.1287/moor.1120.0562en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourceProf. Bertsekas via Chris Sherratten_US
dc.titleOn Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problemsen_US
dc.typeArticleen_US
dc.identifier.citationYu, Huizhen, and Dimitri P. Bertsekas. “On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems.” Mathematics of Operations Research 38, no. 2 (May 2013): 209–227.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.contributor.departmentMassachusetts Institute of Technology. Laboratory for Information and Decision Systemsen_US
dc.contributor.approverBertsekas, Dimitri P.en_US
dc.contributor.mitauthorYu, Huizhenen_US
dc.contributor.mitauthorBertsekas, Dimitri P.en_US
dc.relation.journalMathematics of Operations Researchen_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dspace.orderedauthorsYu, Huizhen; Bertsekas, Dimitri P.en_US
dc.identifier.orcidhttps://orcid.org/0000-0001-6909-7208
mit.licenseOPEN_ACCESS_POLICYen_US
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record