Off-policy reinforcement learning with Gaussian processes

Chowdhary, Girish; Liu, Miao; Grande, Robert; Walsh, Thomas; How, Jonathan; Carin, Lawrence

dc.contributor.author	Chowdhary, Girish
dc.contributor.author	Liu, Miao
dc.contributor.author	Grande, Robert
dc.contributor.author	Walsh, Thomas
dc.contributor.author	How, Jonathan P.
dc.contributor.author	Carin, Lawrence
dc.date.accessioned	2015-05-11T19:13:37Z
dc.date.available	2015-05-11T19:13:37Z
dc.date.issued	2014-07
dc.date.submitted	2014-05
dc.identifier.issn	2329-9266
dc.identifier.uri	http://hdl.handle.net/1721.1/96958
dc.description.abstract	An off-policy Bayesian nonparameteric approximate reinforcement learning framework, termed as GPQ, that employs a Gaussian processes (GP) model of the value (Q) function is presented in both the batch and online settings. Sufficient conditions on GP hyperparameter selection are established to guarantee convergence of off-policy GPQ in the batch setting, and theoretical and practical extensions are provided for the online case. Empirical results demonstrate GPQ has competitive learning speed in addition to its convergence guarantees and its ability to automatically choose its own bases locations.	en_US
dc.description.sponsorship	United States. Office of Naval Research (Autonomy Program N000140910625)	en_US
dc.language.iso	en_US
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	en_US
dc.relation.isversionof	http://dx.doi.org/10.1109/JAS.2014.7004680	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	Other univ. web domain	en_US
dc.title	Off-policy reinforcement learning with Gaussian processes	en_US
dc.type	Article	en_US
dc.identifier.citation	Chowdhary, Girish, Miao Liu, Robert Grande, Thomas Walsh, Jonathan How, and Lawrence Carin. "Off-policy reinforcement learning with Gaussian processes." IEEE/CAA Journal of Automatica Sinica, Vol. 1, No. 3, July 2014.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Aerospace Controls Laboratory	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Aeronautics and Astronautics	en_US
dc.contributor.mitauthor	Grande, Robert	en_US
dc.contributor.mitauthor	Walsh, Thomas	en_US
dc.contributor.mitauthor	How, Jonathan P.	en_US
dc.relation.journal	IEEE/CAA Journal of Automatica Sinica	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dspace.orderedauthors	Chowdhary, Girish; Liu, Miao; Grande, Robert; Walsh, Thomas; How, Jonathan; Carin, Lawrence	en_US
dc.identifier.orcid	https://orcid.org/0000-0001-8576-1930
mit.license	OPEN_ACCESS_POLICY	en_US
mit.metadata.status	Complete

Files in this item

Name:: How_Off-policy.pdf
Size:: 562.6Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record