Off-policy reinforcement learning with Gaussian processes
Author(s)
Chowdhary, Girish; Liu, Miao; Grande, Robert; Walsh, Thomas; How, Jonathan P.; Carin, Lawrence; ... Show more Show less
DownloadHow_Off-policy.pdf (562.6Kb)
OPEN_ACCESS_POLICY
Open Access Policy
Creative Commons Attribution-Noncommercial-Share Alike
Terms of use
Metadata
Show full item recordAbstract
An off-policy Bayesian nonparameteric approximate reinforcement learning framework, termed as GPQ, that employs a Gaussian processes (GP) model of the value (Q) function is presented in both the batch and online settings. Sufficient conditions on GP hyperparameter selection are established to guarantee convergence of off-policy GPQ in the batch setting, and theoretical and practical extensions are provided for the online case. Empirical results demonstrate GPQ has competitive learning speed in addition to its convergence guarantees and its ability to automatically choose its own bases locations.
Date issued
2014-07Department
Massachusetts Institute of Technology. Aerospace Controls Laboratory; Massachusetts Institute of Technology. Department of Aeronautics and AstronauticsJournal
IEEE/CAA Journal of Automatica Sinica
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Citation
Chowdhary, Girish, Miao Liu, Robert Grande, Thomas Walsh, Jonathan How, and Lawrence Carin. "Off-policy reinforcement learning with Gaussian processes." IEEE/CAA Journal of Automatica Sinica, Vol. 1, No. 3, July 2014.
Version: Author's final manuscript
ISSN
2329-9266