On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

Yu, Huizhen; Bertsekas, Dimitri P.

Author(s)

Yu, Huizhen; Bertsekas, Dimitri P.

DownloadYU boundedness.pdf (423.5Kb)

OPEN_ACCESS_POLICY

Terms of use

Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/

Metadata

Show full item record

Abstract

We consider a totally asynchronous stochastic approximation algorithm, Q-learning, for solving finite space stochastic shortest path (SSP) problems, which are undiscounted, total cost Markov decision processes with an absorbing and cost-free state. For the most commonly used SSP models, existing convergence proofs assume that the sequence of Q-learning iterates is bounded with probability one, or some other condition that guarantees boundedness. We prove that the sequence of iterates is naturally bounded with probability one, thus furnishing the boundedness condition in the convergence proof by Tsitsiklis [Tsitsiklis JN (1994) Asynchronous stochastic approximation and Q-learning. Machine Learn. 16:185–202] and establishing completely the convergence of Q-learning for these SSP models.

Date issued

2012-11

URI

http://hdl.handle.net/1721.1/93744

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science; Massachusetts Institute of Technology. Laboratory for Information and Decision Systems

Journal

Mathematics of Operations Research

Publisher

Institute for Operations Research and the Management Sciences (INFORMS)

Citation

Yu, Huizhen, and Dimitri P. Bertsekas. “On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems.” Mathematics of Operations Research 38, no. 2 (May 2013): 209–227.

Version: Author's final manuscript

ISSN

0364-765X

1526-5471

Collections

MIT Open Access Articles

DSpace@MIT