MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

Author(s)
Yu, Huizhen; Bertsekas, Dimitri P.
Thumbnail
DownloadYU boundedness.pdf (423.5Kb)
OPEN_ACCESS_POLICY

Open Access Policy

Creative Commons Attribution-Noncommercial-Share Alike

Terms of use
Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/
Metadata
Show full item record
Abstract
We consider a totally asynchronous stochastic approximation algorithm, Q-learning, for solving finite space stochastic shortest path (SSP) problems, which are undiscounted, total cost Markov decision processes with an absorbing and cost-free state. For the most commonly used SSP models, existing convergence proofs assume that the sequence of Q-learning iterates is bounded with probability one, or some other condition that guarantees boundedness. We prove that the sequence of iterates is naturally bounded with probability one, thus furnishing the boundedness condition in the convergence proof by Tsitsiklis [Tsitsiklis JN (1994) Asynchronous stochastic approximation and Q-learning. Machine Learn. 16:185–202] and establishing completely the convergence of Q-learning for these SSP models.
Date issued
2012-11
URI
http://hdl.handle.net/1721.1/93744
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science; Massachusetts Institute of Technology. Laboratory for Information and Decision Systems
Journal
Mathematics of Operations Research
Publisher
Institute for Operations Research and the Management Sciences (INFORMS)
Citation
Yu, Huizhen, and Dimitri P. Bertsekas. “On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems.” Mathematics of Operations Research 38, no. 2 (May 2013): 209–227.
Version: Author's final manuscript
ISSN
0364-765X
1526-5471

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.