Show simple item record

dc.contributor.authorYu, Huizhen
dc.contributor.authorBertsekas, Dimitri P.
dc.date.accessioned2015-02-03T19:29:00Z
dc.date.available2015-02-03T19:29:00Z
dc.date.issued2012-04
dc.identifier.issn0254-5330
dc.identifier.issn1572-9338
dc.identifier.urihttp://hdl.handle.net/1721.1/93745
dc.description.abstractWe consider the stochastic shortest path problem, a classical finite-state Markovian decision problem with a termination state, and we propose new convergent Q-learning algorithms that combine elements of policy iteration and classical Q-learning/value iteration. These algorithms are related to the ones introduced by the authors for discounted problems in Bertsekas and Yu (Math. Oper. Res. 37(1):66-94, 2012). The main difference from the standard policy iteration approach is in the policy evaluation phase: instead of solving a linear system of equations, our algorithm solves an optimal stopping problem inexactly with a finite number of value iterations. The main advantage over the standard Q-learning approach is lower overhead: most iterations do not require a minimization over all controls, in the spirit of modified policy iteration. We prove the convergence of asynchronous deterministic and stochastic lookup table implementations of our method for undiscounted, total cost stochastic shortest path problems. These implementations overcome some of the traditional convergence difficulties of asynchronous modified policy iteration, and provide policy iteration-like alternative Q-learning schemes with as reliable convergence as classical Q-learning. We also discuss methods that use basis function approximations of Q-factors and we give an associated error bound.en_US
dc.description.sponsorshipUnited States. Air Force (Grant FA9550-10-1-0412)en_US
dc.description.sponsorshipNational Science Foundation (U.S.) (Grant ECCS-0801549)en_US
dc.language.isoen_US
dc.publisherSpringer-Verlagen_US
dc.relation.isversionofhttp://dx.doi.org/10.1007/s10479-012-1128-zen_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourceProf. Bertsekas via Chris Sherratten_US
dc.titleQ-learning and policy iteration algorithms for stochastic shortest path problemsen_US
dc.typeArticleen_US
dc.identifier.citationYu, Huizhen, and Dimitri P. Bertsekas. “Q-Learning and Policy Iteration Algorithms for Stochastic Shortest Path Problems.” Annals of Operations Research 208, no. 1 (April 18, 2012): 95–132.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.contributor.departmentMassachusetts Institute of Technology. Laboratory for Information and Decision Systemsen_US
dc.contributor.approverBertsekas, Dimitri P.en_US
dc.contributor.mitauthorYu, Huizhenen_US
dc.contributor.mitauthorBertsekas, Dimitri P.en_US
dc.relation.journalAnnals of Operations Researchen_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dspace.orderedauthorsYu, Huizhen; Bertsekas, Dimitri P.en_US
dc.identifier.orcidhttps://orcid.org/0000-0001-6909-7208
mit.licenseOPEN_ACCESS_POLICYen_US
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record