MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Q-learning with nearest neighbors

Author(s)
Shah, Devavrat; Xie, Qiaomin
Thumbnail
DownloadPublished version (514.1Kb)
Publisher Policy

Publisher Policy

Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.

Terms of use
Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.
Metadata
Show full item record
Abstract
© 2018 Curran Associates Inc.All rights reserved. We consider model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, when only a single sample path under an arbitrary policy of the system is available. We consider the Nearest Neighbor Q-Learning (NNQL) algorithm to learn the optimal Q function using nearest neighbor regression method. As the main contribution, we provide tight finite sample analysis of the convergence rate. In particular, for MDPs with a d-dimensional state space and the discounted factor γ ∈ (0, 1), given an arbitrary sample path with “covering time” L, we establish that the algorithm is guaranteed to output an ε-accurate estimate of the optimal Q-function using Õ e (L/(ε 3 (1 - γ) 7 )) samples. For instance, for a well-behaved MDP, the covering time of the sample path under the purely random policy scales as Õ e (1/ε d ), so the sample complexity scales as Õ e (1/ε d+3 ). Indeed, we establish a lower bound that argues that the dependence of Ω e (1/ε d+2 ) is necessary.
Date issued
2018
URI
https://hdl.handle.net/1721.1/137946
Department
Massachusetts Institute of Technology. Laboratory for Information and Decision Systems; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science; Statistics and Data Science Center (Massachusetts Institute of Technology)
Citation
Shah, Devavrat and Xie, Qiaomin. 2018. "Q-learning with nearest neighbors."
Version: Final published version

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.